Skip to content

Latest commit

 

History

History
518 lines (409 loc) · 9.35 KB

File metadata and controls

518 lines (409 loc) · 9.35 KB

50 Pandas Interview Questions and Answers

Below are 50 practical Pandas questions and answers focusing on common operations, data manipulation, and analysis techniques using Pandas DataFrames and Series in Python.

Each question includes:

  • A scenario or requirement
  • A code snippet demonstrating one possible solution

Note: For all examples, assume you have already imported Pandas as follows:

import pandas as pd

1. How do you create a Pandas DataFrame from a dictionary?

Answer:
Use pd.DataFrame() with a dictionary of lists or values.

data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)

2. How do you read a CSV file into a DataFrame?

Answer:
Use pd.read_csv() with the file path.

df = pd.read_csv('data.csv')
print(df.head())

3. How do you write a DataFrame to a CSV file?

Answer:
Use df.to_csv() with a file path.

df.to_csv('output.csv', index=False)

4. How do you get the first 5 rows of a DataFrame?

Answer:
Use df.head().

print(df.head())

5. How do you get the shape (rows, columns) of a DataFrame?

Answer:
Use df.shape.

print(df.shape)

6. How do you select a single column from a DataFrame?

Answer:
Use df['column_name'].

ages = df['Age']
print(ages)

7. How do you select multiple columns from a DataFrame?

Answer:
Use a list of column names.

subset = df[['Name', 'Age']]
print(subset)

8. How do you select rows by index range?

Answer:
Use slicing with df.iloc.

rows = df.iloc[0:3]  # first 3 rows
print(rows)

9. How do you select rows based on a condition?

Answer:
Use boolean indexing.

adults = df[df['Age'] > 18]
print(adults)

10. How do you filter rows based on multiple conditions?

Answer:
Use & or | operators with parentheses.

older_males = df[(df['Age'] > 25) & (df['Gender'] == 'Male')]
print(older_males)

11. How do you reset the index of a DataFrame?

Answer:
Use df.reset_index().

df_reset = df.reset_index(drop=True)
print(df_reset)

12. How do you sort a DataFrame by a column?

Answer:
Use df.sort_values().

df_sorted = df.sort_values('Age')
print(df_sorted)

13. How do you sort by multiple columns?

Answer:
Pass a list of column names to sort_values().

df_sorted = df.sort_values(['Gender', 'Age'], ascending=[True, False])
print(df_sorted)

14. How do you drop a column from a DataFrame?

Answer:
Use df.drop() with axis=1.

df_dropped = df.drop('Age', axis=1)
print(df_dropped)

15. How do you drop rows with missing values?

Answer:
Use df.dropna().

df_clean = df.dropna()
print(df_clean)

16. How do you fill missing values with a constant?

Answer:
Use df.fillna(value).

df_filled = df.fillna(0)
print(df_filled)

17. How do you fill missing values with the mean of a column?

Answer:
Calculate mean and then use fillna().

mean_age = df['Age'].mean()
df['Age'] = df['Age'].fillna(mean_age)
print(df)

18. How do you group a DataFrame by a column and compute an aggregation?

Answer:
Use df.groupby('column').agg('function').

avg_age = df.groupby('Gender')['Age'].mean()
print(avg_age)

19. How do you get the unique values of a column?

Answer:
Use df['column'].unique().

unique_names = df['Name'].unique()
print(unique_names)

20. How do you count the number of unique values in a column?

Answer:
Use df['column'].nunique().

count_unique = df['Name'].nunique()
print(count_unique)

21. How do you rename a column in a DataFrame?

Answer:
Use df.rename(columns={'old':'new'}).

df = df.rename(columns={'Name': 'FullName'})
print(df.head())

22. How do you apply a function to every element of a column?

Answer:
Use df['column'].apply(function).

df['AgePlusOne'] = df['Age'].apply(lambda x: x + 1)
print(df.head())

23. How do you create a new column based on a condition?

Answer:
Use np.where or apply a function.

import numpy as np
df['Adult'] = np.where(df['Age'] >= 18, 'Yes', 'No')
print(df.head())

24. How do you combine two DataFrames vertically (append rows)?

Answer:
Use pd.concat([df1, df2]).

df_combined = pd.concat([df1, df2], ignore_index=True)
print(df_combined)

25. How do you merge two DataFrames on a common column?

Answer:
Use pd.merge(df1, df2, on='column').

merged = pd.merge(df1, df2, on='ID')
print(merged.head())

26. How do you get the summary statistics of a DataFrame?

Answer:
Use df.describe().

stats = df.describe()
print(stats)

27. How do you check the data types of each column?

Answer:
Use df.dtypes.

print(df.dtypes)

28. How do you change the data type of a column?

Answer:
Use df['column'].astype(dtype).

df['Age'] = df['Age'].astype(int)

29. How do you set a column as the index of the DataFrame?

Answer:
Use df.set_index('column').

df_indexed = df.set_index('ID')
print(df_indexed.head())

30. How do you reset the index and drop the old index?

Answer:
Use df.reset_index(drop=True).

df_reset = df.reset_index(drop=True)

31. How do you count the number of missing values in each column?

Answer:
Use df.isna().sum().

missing_counts = df.isna().sum()
print(missing_counts)

32. How do you select rows by label using .loc?

Answer:
Use df.loc[label] for index labels.

row_data = df.loc[0]  # if 0 is an index label
print(row_data)

33. How do you select rows by integer position using .iloc?

Answer:
Use df.iloc[index_position].

row_data = df.iloc[0]
print(row_data)

34. How do you remove duplicate rows?

Answer:
Use df.drop_duplicates().

df_unique = df.drop_duplicates()

35. How do you find correlation between columns?

Answer:
Use df.corr().

correlation = df.corr()
print(correlation)

36. How do you create a pivot table?

Answer:
Use pd.pivot_table().

pivot = pd.pivot_table(df, values='Sales', index='Region', columns='Product', aggfunc='sum')
print(pivot)

37. How do you replace values in a column based on a condition?

Answer:
Use df.loc[condition, 'column'] = new_value.

df.loc[df['Age'] < 0, 'Age'] = 0

38. How do you extract a substring from a column of strings?

Answer:
Use str accessor.

df['Initial'] = df['Name'].str[0]
print(df.head())

39. How do you convert a column to datetime?

Answer:
Use pd.to_datetime().

df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')

40. How do you filter rows based on a substring in a column?

Answer:
Use df['column'].str.contains('pattern').

filtered = df[df['Name'].str.contains('A')]
print(filtered)

41. How do you change column names to lowercase?

Answer:
Use a list comprehension on df.columns.

df.columns = [col.lower() for col in df.columns]

42. How do you find the index of the maximum value in a column?

Answer:
Use df['column'].idxmax().

max_index = df['Age'].idxmax()
print(max_index)

43. How do you sample random rows from a DataFrame?

Answer:
Use df.sample(n=number).

sampled = df.sample(n=5)
print(sampled)

44. How do you apply a custom function to a DataFrame row-wise?

Answer:
Use df.apply(function, axis=1).

def sum_age(row):
    return row['Age'] + 10

df['Age_plus_10'] = df.apply(sum_age, axis=1)

45. How do you create dummy variables for categorical columns?

Answer:
Use pd.get_dummies().

df_dummies = pd.get_dummies(df, columns=['Gender'])
print(df_dummies.head())

46. How do you combine text columns into one column?

Answer:
Use string concatenation.

df['FullInfo'] = df['Name'] + ' - ' + df['City']

47. How do you find rows with any missing values?

Answer:
Use df[df.isna().any(axis=1)].

missing_rows = df[df.isna().any(axis=1)]
print(missing_rows)

48. How do you find rows where all values are missing?

Answer:
Use df[df.isna().all(axis=1)].

all_missing = df[df.isna().all(axis=1)]
print(all_missing)

49. How do you remove rows based on a condition?

Answer:
Filter out the condition.

df_filtered = df[df['Age'] >= 18]
print(df_filtered)

50. How do you assign values to a column conditionally using .loc?

Answer:
Use df.loc[condition, 'column'] = value.

df.loc[df['Age'] > 30, 'Category'] = 'Senior'
print(df.head())

If you found this repository helpful, please give it a star!

Follow me on:

Stay updated with my latest content and projects!