What are DataFrames in Python? See examples

A DataFrame is a 2-dimensional, tabular data structureβ€”just like an Excel sheet or SQL tableβ€”where data is organized into rows and columns.

Whether you’re building:

  • Data analysis tools
  • Machine learning models
  • Dashboards

πŸ‘‰ DataFrames are your foundation.


πŸ“Š What is a DataFrame?

A DataFrame is:

  • A table with rows (index) and columns
  • Each column can have a different data type
  • Labeled axes for easy data handling

πŸ”‘ Key Characteristics:

  • Mutable (you can modify data)
  • Size can change dynamically
  • Supports multiple data types

πŸ› οΈ Creating a DataFrame

1. From a Dictionary

import pandas as pddata = {
"Name": ["Sagar", "Aman", "Riya"],
"Age": [25, 23, 22],
"City": ["Delhi", "Mumbai", "Chandigarh"]
}df = pd.DataFrame(data)
print(df)

2. From a List of Lists

data = [
["Sagar", 25],
["Aman", 23],
["Riya", 22]
]df = pd.DataFrame(data, columns=["Name", "Age"])
print(df)

3. From CSV File

df = pd.read_csv("data.csv")

πŸ” Understanding DataFrame Structure

🧠 Important Components:

  • Index β†’ Row labels
  • Columns β†’ Column names
  • Values β†’ Actual data

Example:

print(df.index)
print(df.columns)
print(df.values)

πŸ”Ž Exploring Data

Before analysis, always explore your dataset:

df.head()      # First 5 rows
df.tail() # Last 5 rows
df.info() # Data types & null values
df.describe() # Statistical summary

πŸ‘‰ These functions help you quickly understand your data.


🎯 Selecting Data

πŸ“Œ Select a Column

df["Name"]

πŸ“Œ Select Multiple Columns

df[["Name", "Age"]]

πŸ“Œ Select Rows

df.iloc[0]     # First row
df.loc[0] # By index label

πŸ”„ Adding & Modifying Data

βž• Add New Column

df["Salary"] = [50000, 60000, 55000]

✏️ Modify Existing Column

df["Age"] = df["Age"] + 1

❌ Deleting Data

df.drop("City", axis=1, inplace=True)  # Drop column
df.drop(0, axis=0, inplace=True) # Drop row

πŸ“ DataFrame Properties

df.shape     # Rows & columns
df.size # Total elements
df.ndim # Number of dimensions

⚑ Real-World Example

import pandas as pddf = pd.read_csv("employees.csv")# Show top data
print(df.head())# Add bonus column
df["Bonus"] = df["Salary"] * 0.10# Filter employees with high salary
high_salary = df[df["Salary"] > 50000]print(high_salary)

πŸ‘‰ This is exactly how DataFrames are used in real projects.


πŸš€ Common Mistakes to Avoid

  • ❌ Forgetting axis in drop()
  • ❌ Modifying data without inplace=True
  • ❌ Not checking missing values (df.info())
  • ❌ Confusing loc vs iloc

🌐 External Resources


🏁 Conclusion

DataFrames are the core of Pandas and the starting point of any data analysis task in Python.

Mastering them will allow you to:

  • Clean data efficiently
  • Perform analysis
  • Build real-world data-driven applications

πŸ‘‰ Practice with real datasets to gain confidence.


πŸ”– Hashtags

#Pandas #DataFrames #Python #DataAnalysis #MachineLearning #AI #Coding #Developers #LearnPython #BigData #Analytics

Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *