Different Joins in Pandas: A Comprehensive Guidewith Real – Life Examples

This blog post covers the pandas joins with code examples. Learn the different joins and upskill yourself in the domain of data and analysis.

 

Python have thosuands of open-source library available, out of that Pandas library is very popular and widely used library.

Pandas is favourite to all data enthusiasts. It helps us to do multiple task like File handling, data manipulation, data analysis and many more. One of the most powerful features of Pandas is its ability to join and merge datasets efficiently.

But with multiple types of joins available, it can be tricky to know which one to use in different scenarios.

In this article, we shall break down the different types of joins in Pandas – inner join, left join, right join, and outer join   – with multiple coding examples and real-life use cases.

 

Whether you are a beginner or a pro, this article will help you master Pandas joins. Practice code along with reading blog post. If you have not installed python coding playground yet, the use Google Colaboratory , copy code and run on colaboratory.

 

🔹 What are joins in Pandas?

A join in Pandas refers to merging two datasets (DataFrames) based on a common column. This is similar to SQL  joins but also works within Python’s Pandas library. We use the merge() function to perform joins, specifying how we want to merge the data.

 

Commonly Used Join Types:

  1. Inner Join – Returns only the matching rows.
  2. Left Join – Keeps all rows from the left DataFrame and matches from the right.
  3. Right Join – Keeps all rows from the right DataFrame and matches from the left.
  4. Outer Join (Full Join) – Returns all rows, filling in missing values where necessary.

 

1️⃣ Inner Join: Keeping Only Matching Records

An inner join returns only the rows that have matching values in both DataFrames.

📌 Real-Life Example:

Imagine you have a list of customers who made purchases and another list of loyalty program members. You want to see which customers are also loyalty members.

✅ Code Example:

import pandas as pd

# Creating first DataFrame
customers = pd.DataFrame({
    'Customer_ID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'David']
})

# Creating second DataFrame
loyalty_members = pd.DataFrame({
    'Customer_ID': [2, 3, 5],
    'Membership_Level': ['Gold', 'Silver', 'Bronze']
})

# Performing an inner join
result = pd.merge(customers, loyalty_members, on='Customer_ID', how='inner')
print(result)

🔍 Output:

   Customer_ID    Name Membership_Level
0           2     Bob             Gold
1           3  Charlie          Silver

🔥 Takeaway:

Only customers Bob and Charlie appear because they exist in both DataFrames.

 

2️⃣ Left Join: Keeping All Left Table Data

A left join keeps all records from the left DataFrame and adds matching rows from the right. If there’s no match, it fills with NaN.

📌 Real-Life Example:

You have an employee list and another table with their salaries, but not all employees have a recorded salary yet.

✅ Code Example:

employees = pd.DataFrame({
    'Emp_ID': [101, 102, 103, 104],
    'Name': ['John', 'Jane', 'Jake', 'Jill']
})

salaries = pd.DataFrame({
    'Emp_ID': [102, 103],
    'Salary': [70000, 80000]
})

# Performing a left join
result = pd.merge(employees, salaries, on='Emp_ID', how='left')
print(result)

🔍 Output:

   Emp_ID  Name   Salary
0    101  John      NaN
1    102  Jane  70000.0
2    103  Jake  80000.0
3    104  Jill      NaN

🔥 Takeaway:

John and Jill don’t have salaries recorded, so their values show as NaN.


3️⃣ Right Join: Prioritizing the Right Table

A right join is the opposite of a left join: it keeps all records from the right DataFrame while keeping matching records from the left.

📌 Real-Life Example:

Imagine a store has a product catalog, and a separate list records recent sales. You want to keep all sold products, even if they don’t exist in the full catalog.

✅ Code Example:

products = pd.DataFrame({
    'Product_ID': [1, 2, 3],
    'Product_Name': ['Laptop', 'Phone', 'Tablet']
})

sales = pd.DataFrame({
    'Product_ID': [2, 3, 4],
    'Quantity_Sold': [5, 7, 2]
})

# Performing a right join
result = pd.merge(products, sales, on='Product_ID', how='right')
print(result)

🔍 Output:

   Product_ID Product_Name  Quantity_Sold
0          2       Phone             5
1          3      Tablet             7
2          4         NaN             2

🔥 Takeaway:

The right DataFrame (sales) is prioritized, so even Product ID 4, which isn’t in the catalog, appears.


4️⃣ Outer Join: Getting Everything

An outer join returns all rows from both DataFrames, filling missing values with NaN where necessary.

📌 Real-Life Example:

You have a student list and another list of students who passed a test. You want to see all students, marking those who passed.

✅ Code Example:

students = pd.DataFrame({
    'Student_ID': [1, 2, 3, 4],
    'Name': ['Emma', 'Liam', 'Olivia', 'Noah']
})

passed_students = pd.DataFrame({
    'Student_ID': [2, 3, 5],
    'Passed': ['Yes', 'Yes', 'Yes']
})

# Performing an outer join
result = pd.merge(students, passed_students, on='Student_ID', how='outer')
print(result)

🔍 Output:

   Student_ID     Name Passed
0          1    Emma    NaN
1          2    Liam    Yes
2          3  Olivia    Yes
3          4    Noah    NaN
4          5     NaN    Yes

🔥 Takeaway:

We see all students and all pass results, even if they don’t match.

 

🎯 Conclusion

Mastering joins in Pandas allows us to handle and analyze data effectively. Let’s Observe the quick recap of joins:

  • Inner Join → Only matching data.
  • Left Join → All left data, with matching right.
  • Right Join → All right data, with matching left.
  • Outer Join → Everything, filling missing values.

Happy Coding!

Happy Learning!!

×