Skip to content
Group and Aggregate data with custom functions
Group and Aggregate Data
Quickly gain insights into the main numerical summary statistics of your data by choosing appropriate aggregation functions or defining custom functions.
# Load packages
import pandas as pd
# Upload your data as CSV and load as data frame
df = pd.read_csv("data.csv", parse_dates=["date"], index_col=0)
df.head()
Choose the aggregation function
Function | Description |
---|---|
count | Number of non-null observations |
sum | Sum of values |
mean | Mean of values |
mad | Mean absolute deviation |
median | Arithmetic median of values |
min | Minimum |
max | Maximum |
mode | Mode |
abs | Absolute Value |
prod | Product of values |
std | Unbiased standard deviation |
var | Unbiased variance |
sem | Unbiased standard error of the mean |
skew | Unbiased skewness (3rd moment) |
kurt | Unbiased kurtosis (4th moment) |
quantile | Sample quantile (value at %) |
cumsum | Cumulative sum |
cumprod | Cumulative product |
cummax | Cumulative maximum |
cummin | Cumulative minimum |
# Define a custom function
def custom(x):
return x.mean() - x.median()
# Group the data
grouped = df.groupby(by=["device", "gender"]) # Choose column(s) to groupby
# Aggregate the data
aggregation = grouped.agg(
{
"price": [ # Column to aggregate over
"mean",
"median",
"std", # Use standard functions
custom, # Or use custom functions
]
}
)
# Examine the results
print(aggregation)