How to Handle Missing Data in Pandas? See Example

In real-world datasets, missing values are unavoidable. Whether you’re working with user data, surveys, or financial records, you’ll often encounter empty or null entries.

If not handled properly, missing data can:

  • ❌ Break your analysis
  • ❌ Reduce accuracy of models
  • ❌ Lead to incorrect insights

👉 That’s why data cleaning is one of the most important steps in data analysis.


❓ What is Missing Data?

Missing data in Pandas is usually represented as:

  • NaN (Not a Number)
  • None

Example:

import pandas as pd
import numpy as npdata = {
"Name": ["Sagar", "Aman", "Riya"],
"Age": [25, np.nan, 22]
}df = pd.DataFrame(data)
print(df)

🔍 Detecting Missing Values

Before fixing missing data, you need to find it.

df.isnull()       # Returns True/False
df.isnull().sum() # Count missing values per column

👉 Always start with this step!


🧹 Removing Missing Data

1. Drop Rows with Missing Values

df.dropna()

2. Drop Columns with Missing Values

df.dropna(axis=1)

3. Drop Only if All Values are Missing

df.dropna(how='all')

⚠️ Use carefully—this can remove important data.


🔄 Filling Missing Data

1. Fill with a Constant Value

df.fillna(0)

2. Fill with Mean / Median

df["Age"].fillna(df["Age"].mean(), inplace=True)

👉 Best for numerical data.


3. Forward Fill (ffill)

df.fillna(method='ffill')

👉 Copies previous value forward.


4. Backward Fill (bfill)

df.fillna(method='bfill')

👉 Uses next available value.


🧠 Advanced Techniques

🔹 Interpolation

df["Age"].interpolate()

👉 Estimates missing values based on trends.


🔹 Fill Based on Conditions

df["Age"] = df["Age"].fillna(df["Age"].mean())

🔹 Replace Specific Missing Values

df.replace(np.nan, 0)

⚡ Real-World Example

import pandas as pddf = pd.read_csv("employees.csv")# Check missing values
print(df.isnull().sum())# Fill salary with mean
df["Salary"].fillna(df["Salary"].mean(), inplace=True)# Drop rows where Name is missing
df.dropna(subset=["Name"], inplace=True)print(df.head())

🎯 Best Practices

  • ✔️ Always analyze missing data first
  • ✔️ Use mean/median for numeric data
  • ✔️ Use forward fill for time-series data
  • ✔️ Avoid blindly dropping rows

🚫 Common Mistakes

  • ❌ Dropping too much data
  • ❌ Filling with wrong values
  • ❌ Ignoring missing data completely
  • ❌ Not checking dataset after cleaning

🌐 External Resources


🏁 Conclusion

Handling missing data is a critical step in data preprocessing. Pandas provides powerful tools to:

  • Detect missing values
  • Remove unnecessary data
  • Fill gaps intelligently

👉 Mastering this will make your data analysis far more accurate and reliable.


🔖 Hashtags

#Pandas #DataCleaning #Python #DataScience #MachineLearning #AI #DataAnalysis #BigData #LearnPython #Analytics

Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *