A DataFrame is a 2-dimensional, tabular data structureβjust like an Excel sheet or SQL tableβwhere data is organized into rows and columns.
Whether you’re building:
- Data analysis tools
- Machine learning models
- Dashboards
π DataFrames are your foundation.
π What is a DataFrame?
A DataFrame is:
- A table with rows (index) and columns
- Each column can have a different data type
- Labeled axes for easy data handling
π Key Characteristics:
- Mutable (you can modify data)
- Size can change dynamically
- Supports multiple data types
π οΈ Creating a DataFrame
1. From a Dictionary
import pandas as pddata = {
"Name": ["Sagar", "Aman", "Riya"],
"Age": [25, 23, 22],
"City": ["Delhi", "Mumbai", "Chandigarh"]
}df = pd.DataFrame(data)
print(df)
2. From a List of Lists
data = [
["Sagar", 25],
["Aman", 23],
["Riya", 22]
]df = pd.DataFrame(data, columns=["Name", "Age"])
print(df)
3. From CSV File
df = pd.read_csv("data.csv")
π Understanding DataFrame Structure
π§ Important Components:
- Index β Row labels
- Columns β Column names
- Values β Actual data
Example:
print(df.index)
print(df.columns)
print(df.values)
π Exploring Data
Before analysis, always explore your dataset:
df.head() # First 5 rows
df.tail() # Last 5 rows
df.info() # Data types & null values
df.describe() # Statistical summary
π These functions help you quickly understand your data.
π― Selecting Data
π Select a Column
df["Name"]
π Select Multiple Columns
df[["Name", "Age"]]
π Select Rows
df.iloc[0] # First row
df.loc[0] # By index label
π Adding & Modifying Data
β Add New Column
df["Salary"] = [50000, 60000, 55000]
βοΈ Modify Existing Column
df["Age"] = df["Age"] + 1
β Deleting Data
df.drop("City", axis=1, inplace=True) # Drop column
df.drop(0, axis=0, inplace=True) # Drop row
π DataFrame Properties
df.shape # Rows & columns
df.size # Total elements
df.ndim # Number of dimensions
β‘ Real-World Example
import pandas as pddf = pd.read_csv("employees.csv")# Show top data
print(df.head())# Add bonus column
df["Bonus"] = df["Salary"] * 0.10# Filter employees with high salary
high_salary = df[df["Salary"] > 50000]print(high_salary)
π This is exactly how DataFrames are used in real projects.
π Common Mistakes to Avoid
- β Forgetting
axisindrop() - β Modifying data without
inplace=True - β Not checking missing values (
df.info()) - β Confusing
locvsiloc
π External Resources
- Official Docs: https://pandas.pydata.org/docs/
- Pandas GitHub: https://github.com/pandas-dev/pandas
- NumPy Library: https://numpy.org/
π Conclusion
DataFrames are the core of Pandas and the starting point of any data analysis task in Python.
Mastering them will allow you to:
- Clean data efficiently
- Perform analysis
- Build real-world data-driven applications
π Practice with real datasets to gain confidence.
π Hashtags
#Pandas #DataFrames #Python #DataAnalysis #MachineLearning #AI #Coding #Developers #LearnPython #BigData #Analytics