Function list:
df.groupby('column')
: Groups the DataFrame by the specified column(s), allowing you to apply aggregate functions like sum, mean, etc., to each group.df.pivot_table(values='value', index='column', columns='column2')
: Creates a pivot table, summarizing data by grouping it by one or more index columns and computing aggregated values for the specified columns.df.resample('time_period')
: Groups time-series data into specified time periods (e.g., daily, monthly) and allows applying aggregation functions on each group.df.rolling(window=3)
: Groups data into rolling windows of a specified size, enabling the computation of aggregate functions like moving averages over these windows.df.expanding(min_periods=1)
: Similar to rolling, this function groups data by expanding windows, allowing cumulative calculations as more data is included in each step.df.cumsum()
: Groups data implicitly by its order and computes the cumulative sum across a DataFrame or Series, returning running totals for each group.df.cumprod()
: Computes the cumulative product for each group in a DataFrame or Series, multiplying values as the group progresses.df.cut(df['column'], bins=3)
: Groups continuous data into discrete bins or intervals and allows you to analyze the data within each bin.df.qcut(df['column'], q=4)
: Similar tocut
, this function divides data into quantile-based bins, creating equal-sized groups based on percentiles or quartiles.df.aggregate(['sum', 'mean'])
: Allows applying multiple aggregation functions (like sum, mean, etc.) to grouped data, either usinggroupby()
or on the entire DataFrame.df.transform(lambda x: x - x.mean())
: Applies a function to each group, returning a transformed DataFrame where the function (like centering by mean) is applied group-wise.
Example codes
Here are the updated examples, including a brief explanation of what each function does:
df.groupby('column')
: Groups the DataFrame by the specified column(s) and applies an aggregate function like sum.
df = pd.DataFrame({'category': ['A', 'B', 'A', 'B'], 'value': [10, 20, 30, 40]})
grouped = df.groupby('category').sum()
print(grouped)
# Groups the data by 'category' and sums the 'value' column for each group.
Output:
value
category
A 40
B 60
df.pivot_table(values='value', index='category', columns='sub_category')
: Creates a pivot table to summarize data by grouping on index and columns.
df = pd.DataFrame({'category': ['A', 'A', 'B', 'B'], 'sub_category': ['X', 'Y', 'X', 'Y'], 'value': [10, 20, 30, 40]})
pivot = df.pivot_table(values='value', index='category', columns='sub_category')
print(pivot)
# Groups data by 'category' and 'sub_category' and calculates the sum of 'value'.
Output:
sub_category X Y
category
A 10.0 20.0
B 30.0 40.0
df.resample('M')
: Groups time-series data by the specified time period (e.g., monthly) and applies an aggregate function like sum.
df = pd.DataFrame({'date': pd.date_range('2023-01-01', periods=6, freq='D'), 'value': [1, 2, 3, 4, 5, 6]})
df.set_index('date', inplace=True)
resampled = df.resample('M').sum()
print(resampled)
# Resamples the data by month and calculates the sum of 'value' for each month.
Output:
value
date
2023-01-31 21
df.rolling(window=3)
: Groups data into rolling windows of a specified size and computes aggregate functions like sum.
df = pd.DataFrame({'value': [1, 2, 3, 4, 5]})
rolling = df.rolling(window=3).sum()
print(rolling)
# Applies a rolling window of size 3 and calculates the sum for each window.
Output:
value
0 NaN
1 NaN
2 6.0
3 9.0
4 12.0
df.expanding(min_periods=1)
: Expands the window size over the data and applies cumulative calculations, like sum.
df = pd.DataFrame({'value': [1, 2, 3, 4, 5]})
expanding = df.expanding(min_periods=1).sum()
print(expanding)
# Expands the window and calculates the cumulative sum at each step.
Output:
value
0 1
1 3
2 6
3 10
4 15
df.cumsum()
: Computes the cumulative sum of values across the DataFrame or Series.
df = pd.DataFrame({'value': [1, 2, 3, 4, 5]})
cumsum = df.cumsum()
print(cumsum)
# Calculates the cumulative sum of the 'value' column.
Output:
value
0 1
1 3
2 6
3 10
4 15
df.cumprod()
: Computes the cumulative product of values across the DataFrame or Series.
df = pd.DataFrame({'value': [1, 2, 3, 4]})
cumprod = df.cumprod()
print(cumprod)
# Calculates the cumulative product of the 'value' column.
Output:
value
0 1
1 2
2 6
3 24
df.cut(df['column'], bins=3)
: Groups continuous data into discrete bins.
df = pd.DataFrame({'value': [1, 2, 3, 4, 5, 6, 7, 8, 9]})
df['bins'] = pd.cut(df['value'], bins=3)
print(df)
# Divides the 'value' column into 3 equal-width bins.
Output:
value bins
0 1 (0.992, 4.0]
1 2 (0.992, 4.0]
2 3 (0.992, 4.0]
3 4 (0.992, 4.0]
4 5 (4.0, 7.0]
5 6 (4.0, 7.0]
6 7 (4.0, 7.0]
7 8 (7.0, 9.0]
8 9 (7.0, 9.0]
df.qcut(df['column'], q=4)
: Groups continuous data into quantile-based bins.
df = pd.DataFrame({'value': [1, 2, 3, 4, 5, 6, 7, 8, 9]})
df['quantiles'] = pd.qcut(df['value'], q=4)
print(df)
# Divides the 'value' column into 4 equal-sized quantile bins.
Output:
value quantiles
0 1 (0.999, 3.5]
1 2 (0.999, 3.5]
2 3 (0.999, 3.5]
3 4 (3.5, 5.5]
4 5 (3.5, 5.5]
5 6 (5.5, 7.5]
6 7 (5.5, 7.5]
7 8 (7.5, 9.0]
8 9 (7.5, 9.0]
df.aggregate(['sum', 'mean'])
: Applies multiple aggregate functions to the DataFrame.
df = pd.DataFrame({'value1': [1, 2, 3], 'value2': [4, 5, 6]})
aggregated = df.aggregate(['sum', 'mean'])
print(aggregated)
# Aggregates the data using 'sum' and 'mean' functions for each column.
Output:
value1 value2
sum 6.0 15.0
mean 2.0 5.0
df.transform(lambda x: x - x.mean())
: Applies a transformation function to each group.
df = pd.DataFrame({'group': ['A', 'A', 'B', 'B'], 'value': [10, 20, 30, 40]})
transformed = df.groupby('group').transform(lambda x: x - x.mean())
print(transformed)
# Subtracts the mean of each group from the group's values.
Output:
value
0 -5.0
1 5.0
2 -5.0
3 5.0
Discover more from Science Comics
Subscribe to get the latest posts sent to your email.