๐งพ Introduction
In real-world projects, your data is rarely stored in a single file.
๐ You often need to combine multiple datasets.
Pandas provides three powerful ways to do this:
- Merge โ SQL-style joins
- Join โ Index-based combining
- Concat โ Stacking data
๐ Why Combining Data is Important
You may need to:
- Combine customer and order data
- Merge datasets from different sources
- Append new data to existing data
๐ This is a core skill in data engineering and analytics.
๐ 1. Pandas Merge
Merge is similar to SQL joins.
๐ Syntax
pd.merge(df1, df2, on="ID")
๐ Types of Merge
๐น Inner Join (Default)
Returns only matching records.
pd.merge(df1, df2, on="ID", how="inner")
๐น Left Join
Returns all records from left DataFrame.
pd.merge(df1, df2, on="ID", how="left")
๐น Right Join
pd.merge(df1, df2, on="ID", how="right")
๐น Outer Join
Returns all records from both.
pd.merge(df1, df2, on="ID", how="outer")
๐ 2. Pandas Join
Join works on indexes instead of columns.
df1.join(df2, how="inner")
๐ Best when your data is indexed properly.
๐ฆ 3. Pandas Concatenate
Concat is used to stack data.
๐น Row-wise (Vertical)
pd.concat([df1, df2], axis=0)
๐น Column-wise (Horizontal)
pd.concat([df1, df2], axis=1)
โก Real-World Example
import pandas as pdcustomers = pd.read_csv("customers.csv")
orders = pd.read_csv("orders.csv")# Merge datasets
df = pd.merge(customers, orders, on="CustomerID", how="inner")print(df.head())
๐ง Key Differences
| Method | Use Case |
|---|---|
| Merge | SQL-style joins |
| Join | Index-based combining |
| Concat | Stacking data |
๐ Best Practices
- โ๏ธ Always check common columns before merge
- โ๏ธ Use correct join type
- โ๏ธ Clean data before combining
- โ๏ธ Verify results after merging
๐ซ Common Mistakes
- โ Merging on wrong column
- โ Duplicate column names
- โ Unexpected null values after merge
- โ Using concat instead of merge
๐ฏ When to Use What?
- Use merge() โ When datasets share a key column
- Use join() โ When working with indexes
- Use concat() โ When stacking similar data
๐ External Resources
- Pandas Merge Docs: https://pandas.pydata.org/docs/reference/api/pandas.merge.html
- Pandas Concat Docs: https://pandas.pydata.org/docs/reference/api/pandas.concat.html
๐ Conclusion
Combining data is one of the most important tasks in data analysis.
With Pandas:
merge()helps combine relational datajoin()simplifies index mergingconcat()stacks datasets easily
๐ Master these tools to handle real-world data efficiently.
๐ Hashtags
#Pandas #Merge #Join #Concat #Python #DataEngineering #DataScience #Analytics #BigData #LearnPython