Below are 50 practical Pandas questions and answers focusing on common operations, data manipulation, and analysis techniques using Pandas DataFrames and Series in Python.
Each question includes:
- A scenario or requirement
- A code snippet demonstrating one possible solution
Note: For all examples, assume you have already imported Pandas as follows:
import pandas as pdAnswer:
Use pd.DataFrame() with a dictionary of lists or values.
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)Answer:
Use pd.read_csv() with the file path.
df = pd.read_csv('data.csv')
print(df.head())Answer:
Use df.to_csv() with a file path.
df.to_csv('output.csv', index=False)Answer:
Use df.head().
print(df.head())Answer:
Use df.shape.
print(df.shape)Answer:
Use df['column_name'].
ages = df['Age']
print(ages)Answer:
Use a list of column names.
subset = df[['Name', 'Age']]
print(subset)Answer:
Use slicing with df.iloc.
rows = df.iloc[0:3] # first 3 rows
print(rows)Answer:
Use boolean indexing.
adults = df[df['Age'] > 18]
print(adults)Answer:
Use & or | operators with parentheses.
older_males = df[(df['Age'] > 25) & (df['Gender'] == 'Male')]
print(older_males)Answer:
Use df.reset_index().
df_reset = df.reset_index(drop=True)
print(df_reset)Answer:
Use df.sort_values().
df_sorted = df.sort_values('Age')
print(df_sorted)Answer:
Pass a list of column names to sort_values().
df_sorted = df.sort_values(['Gender', 'Age'], ascending=[True, False])
print(df_sorted)Answer:
Use df.drop() with axis=1.
df_dropped = df.drop('Age', axis=1)
print(df_dropped)Answer:
Use df.dropna().
df_clean = df.dropna()
print(df_clean)Answer:
Use df.fillna(value).
df_filled = df.fillna(0)
print(df_filled)Answer:
Calculate mean and then use fillna().
mean_age = df['Age'].mean()
df['Age'] = df['Age'].fillna(mean_age)
print(df)Answer:
Use df.groupby('column').agg('function').
avg_age = df.groupby('Gender')['Age'].mean()
print(avg_age)Answer:
Use df['column'].unique().
unique_names = df['Name'].unique()
print(unique_names)Answer:
Use df['column'].nunique().
count_unique = df['Name'].nunique()
print(count_unique)Answer:
Use df.rename(columns={'old':'new'}).
df = df.rename(columns={'Name': 'FullName'})
print(df.head())Answer:
Use df['column'].apply(function).
df['AgePlusOne'] = df['Age'].apply(lambda x: x + 1)
print(df.head())Answer:
Use np.where or apply a function.
import numpy as np
df['Adult'] = np.where(df['Age'] >= 18, 'Yes', 'No')
print(df.head())Answer:
Use pd.concat([df1, df2]).
df_combined = pd.concat([df1, df2], ignore_index=True)
print(df_combined)Answer:
Use pd.merge(df1, df2, on='column').
merged = pd.merge(df1, df2, on='ID')
print(merged.head())Answer:
Use df.describe().
stats = df.describe()
print(stats)Answer:
Use df.dtypes.
print(df.dtypes)Answer:
Use df['column'].astype(dtype).
df['Age'] = df['Age'].astype(int)Answer:
Use df.set_index('column').
df_indexed = df.set_index('ID')
print(df_indexed.head())Answer:
Use df.reset_index(drop=True).
df_reset = df.reset_index(drop=True)Answer:
Use df.isna().sum().
missing_counts = df.isna().sum()
print(missing_counts)Answer:
Use df.loc[label] for index labels.
row_data = df.loc[0] # if 0 is an index label
print(row_data)Answer:
Use df.iloc[index_position].
row_data = df.iloc[0]
print(row_data)Answer:
Use df.drop_duplicates().
df_unique = df.drop_duplicates()Answer:
Use df.corr().
correlation = df.corr()
print(correlation)Answer:
Use pd.pivot_table().
pivot = pd.pivot_table(df, values='Sales', index='Region', columns='Product', aggfunc='sum')
print(pivot)Answer:
Use df.loc[condition, 'column'] = new_value.
df.loc[df['Age'] < 0, 'Age'] = 0Answer:
Use str accessor.
df['Initial'] = df['Name'].str[0]
print(df.head())Answer:
Use pd.to_datetime().
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')Answer:
Use df['column'].str.contains('pattern').
filtered = df[df['Name'].str.contains('A')]
print(filtered)Answer:
Use a list comprehension on df.columns.
df.columns = [col.lower() for col in df.columns]Answer:
Use df['column'].idxmax().
max_index = df['Age'].idxmax()
print(max_index)Answer:
Use df.sample(n=number).
sampled = df.sample(n=5)
print(sampled)Answer:
Use df.apply(function, axis=1).
def sum_age(row):
return row['Age'] + 10
df['Age_plus_10'] = df.apply(sum_age, axis=1)Answer:
Use pd.get_dummies().
df_dummies = pd.get_dummies(df, columns=['Gender'])
print(df_dummies.head())Answer:
Use string concatenation.
df['FullInfo'] = df['Name'] + ' - ' + df['City']Answer:
Use df[df.isna().any(axis=1)].
missing_rows = df[df.isna().any(axis=1)]
print(missing_rows)Answer:
Use df[df.isna().all(axis=1)].
all_missing = df[df.isna().all(axis=1)]
print(all_missing)Answer:
Filter out the condition.
df_filtered = df[df['Age'] >= 18]
print(df_filtered)Answer:
Use df.loc[condition, 'column'] = value.
df.loc[df['Age'] > 30, 'Category'] = 'Senior'
print(df.head())If you found this repository helpful, please give it a star!
Follow me on:
Stay updated with my latest content and projects!