List of Pandas functions in the song:
df.sort_values('feature', ascending=False): Sorts the DataFrame by the specified ‘feature’ column in descending order.df.reset_index(): Resets the index of the DataFrame, converting the current index into a column and generating a new sequential index.df.drop(columns=['width', 'height']): Removes the specified ‘width’ and ‘height’ columns from the DataFrame.df.drop_duplicates(): Removes duplicate rows from the DataFrame.df.sample(frac=0.2): Returns a random sample of 20% of the rows from the DataFrame.df.sample(n=20): Returns a random sample of 20 rows from the DataFrame.df.plot.hist(): Creates a histogram plot for the numerical data in the DataFrame.df.plot.scatter(x, y): Generates a scatter plot using the specified columns ‘x’ and ‘y’ from the DataFrame.df.dropna(): Removes rows that contain missing (NaN) values.df.fillna(value): Fills missing (NaN) values in the DataFrame with the specified value.
Example codes:
Here are the examples with their corresponding outputs:
df.sort_values('feature', ascending=False):
df = pd.DataFrame({'feature': [10, 20, 15], 'name': ['A', 'B', 'C']})
df_sorted = df.sort_values('feature', ascending=False)
print(df_sorted)
Output:
feature name
1 20 B
2 15 C
0 10 A
df.reset_index():
df = pd.DataFrame({'name': ['A', 'B', 'C']}, index=[10, 20, 30])
df_reset = df.reset_index()
print(df_reset)
Output:
index name
0 10 A
1 20 B
2 30 C
df.drop(columns=['width', 'height']):
df = pd.DataFrame({'width': [10, 20], 'height': [30, 40], 'depth': [5, 10]})
df_dropped = df.drop(columns=['width', 'height'])
print(df_dropped)
Output:
depth
0 5
1 10
df.drop_duplicates():
df = pd.DataFrame({'name': ['A', 'B', 'A'], 'value': [1, 2, 1]})
df_unique = df.drop_duplicates()
print(df_unique)
Output:
name value
0 A 1
1 B 2
df.sample(frac=0.2):
df = pd.DataFrame({'name': ['A', 'B', 'C', 'D', 'E'], 'value': [1, 2, 3, 4, 5]})
df_sampled = df.sample(frac=0.2)
print(df_sampled)
Output (random selection, may vary):
name value
0 A 1
df.sample(n=2):
df = pd.DataFrame({'name': ['A', 'B', 'C', 'D', 'E'], 'value': [1, 2, 3, 4, 5]})
df_sampled = df.sample(n=2)
print(df_sampled)
Output (random selection, may vary):
name value
1 B 2
4 E 5
df.plot.hist():
df = pd.DataFrame({'data': [1, 2, 2, 3, 3, 3, 4]})
df.plot.hist()
Output: (Histogram plot will display the frequency of the values)
df.plot.scatter(x='x_col', y='y_col'):
df = pd.DataFrame({'x_col': [1, 2, 3], 'y_col': [4, 5, 6]})
df.plot.scatter(x='x_col', y='y_col')
Output: (Scatter plot will display points at (1,4), (2,5), and (3,6))
df.dropna():
df = pd.DataFrame({'name': ['A', 'B', None], 'value': [1, None, 3]})
df_cleaned = df.dropna()
print(df_cleaned)
Output:
name value
0 A 1.0
2 None 3.0
df.fillna(value=0):
df = pd.DataFrame({'name': ['A', 'B', None], 'value': [1, None, 3]})
df_filled = df.fillna(value=0)
print(df_filled)
Output:
name value
0 A 1.0
1 B 0.0
2 0 3.0