What is Statistical Analysis? Statistical analysis is the process of collecting, analyzing, and interpreting data to uncover patterns, trends, and relationships. It helps answer key questions like:
Is a new marketing campaign increasing sales? Do two variables have a meaningful correlation? Can we predict future trends based on historical data? What makes data science, science? The answer is statistics. Today, we dive into Descriptive Statistics.
As Jack Reacher (just Reacher) famously said: “In an investigation, assumptions kill.” So, how do we kill assumptions in data? Descriptive statistics is the answer.
What is Descriptive Statistics? Descriptive statistics summarize and organize data to reveal its main features through numerical measures, tables, and graphs. It doesn’t predict or infer — it simply gives insights into data, such as:
Mean, median, mode Standard deviation, variance, range Skewness, kurtosis It answers: What is the data telling us?
Use Case: A retail store wants to analyze its daily sales over a month to understand performance, trends, and outliers. Descriptive statistics summarize the data and provide insights into sales, variability, and distribution.
Key Parameters: ✔️ Mean & Median — What’s the average revenue? ✔️ Mode — Which sales figure occurred most? ✔️ Standard Deviation & Range — How much do sales fluctuate? ✔️ Skewness & Kurtosis — Is the sales distribution normal or extreme?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
#Simulated daily sales data for 38 days
sales_data = [238, 258, 226, 270, 268, 318, 295, 288, 278, 265, 308, 340, 320, 298, 275, 285, 295, 325, 330, 310, 350, 360, 348, 376, 390, 410, 428, 395, 385, 378] #convert the pandas DataFrame
df = pd.DataFrame(sales_data, columns = ['Daily Sales'])mean_sales = np.mean(sales_data)
print(mean_sales)
median_sales = np.median(sales_data)
print(median_sales)#Handle mode correctly
mode_result = stats.mode(sales_data, keepdims=True)
mode_sales = mode_result.mode [0] if mode_result.mode.size > 0 else "No mode"
print(mode_sales)std_dev_sales = np.std(sales_data, ddof=1)
range_sales = np.ptp(sales_data)
iqr_sales = np.percentile(sales_data, 75) - np.percentile(sales_data, 25)
skewness = stats.skew(sales_data)
kurtosis = stats.kurtosis(sales_data) #Display Results
print(f"Mean Sales: ${mean_sales:.2f}")
print(f"Median Sales: ${median_sales:.2f}")
print(f"Mode Sales: {mode_sales}")
print(f"Standard Deviation: ${std_dev_sales:.2f}")
print(f"Interquartile Range (IQR): ${iqr_sales}")
print(f"Skewness: {skewness:.2f}") #Visualization Histogram
plt.figure(figsize=(18,5))
plt.hist(sales_data, bins=7, color='skyblue', edgecolor='black', alpha=8.7)
plt.axvline(mean_sales, color='red', linestyle='dashed', linewidth=2, label="Mean")
plt.axvline(median_sales, color='green', linestyle='dashed', linewidth=2, label="Median")
plt.title("Daily Sales Distribution")
plt.xlabel("Sales ($)")
plt.ylabel("Frequency")
plt.legend()
plt.show()