Skip to content

Group and Aggregate Data

Quickly gain insights into the main numerical summary statistics of your data by choosing appropriate aggregation functions or defining custom functions.

# Load packages
import pandas as pd
# Upload your data as CSV and load as data frame
df = pd.read_csv("data.csv", parse_dates=["date"], index_col=0)
df.head()

Choose the aggregation function

FunctionDescription
countNumber of non-null observations
sumSum of values
meanMean of values
madMean absolute deviation
medianArithmetic median of values
minMinimum
maxMaximum
modeMode
absAbsolute Value
prodProduct of values
stdUnbiased standard deviation
varUnbiased variance
semUnbiased standard error of the mean
skewUnbiased skewness (3rd moment)
kurtUnbiased kurtosis (4th moment)
quantileSample quantile (value at %)
cumsumCumulative sum
cumprodCumulative product
cummaxCumulative maximum
cumminCumulative minimum
# Define a custom function
def custom(x):
    return x.mean() - x.median()


# Group the data
grouped = df.groupby(by=["device", "gender"])  # Choose column(s) to groupby

# Aggregate the data
aggregation = grouped.agg(
    {
        "price": [  # Column to aggregate over
            "mean",
            "median",
            "std",  # Use standard functions
            custom,  # Or use custom functions
        ]
    }
)

# Examine the results
print(aggregation)