Skip to main content

Course

Statistical Thinking in Python (Part 1)

IntermediateSkill Level

4.8+

Updated 03/2026

Build the foundation you need to think statistically and to speak the language of your data.

Start Course for Free

PythonProbability & Statistics

3 hr

18 videos

61 Exercises

4,550 XP

180K+

Statement of Accomplishment

Loved by learners at thousands of companies

Training a Team?

Try for Business

Course Description

After all of the hard work of acquiring data and getting them into a form you can work with, you ultimately want to make clear, succinct conclusions from them. This crucial last step of a data analysis pipeline hinges on the principles of statistical inference. In this course, you will start building the foundation you need to think statistically, speak the language of your data, and understand what your data is telling you. The foundations of statistical thinking took decades to build, but can be grasped much faster today with the help of computers. With the power of Python-based tools, you will rapidly get up-to-speed and begin thinking statistically by the end of this course.

Prerequisites

1

Graphical Exploratory Data Analysis

Before diving into sophisticated statistical inference techniques, you should first explore your data by plotting them and computing simple summary statistics. This process, called exploratory data analysis, is a crucial first step in statistical analysis of data.

Introduction to Exploratory Data Analysis

What is the goal of statistical inference?

Advantages of graphical EDA

Plotting a histogram

Plotting a histogram of iris data

Axis labels!

Adjusting the number of bins in a histogram

Plot all of your data: Bee swarm plots

Bee swarm plot

Interpreting a bee swarm plot

Plot all of your data: ECDFs

Computing the ECDF

Plotting the ECDF

Comparison of ECDFs

Onward toward the whole story!

2

Quantitative Exploratory Data Analysis

In this chapter, you will compute useful summary statistics, which serve to concisely describe salient features of a dataset with a few numbers.

Introduction to summary statistics: The sample mean and median

Means and medians

Computing means

Percentiles, outliers, and box plots

Computing percentiles

Comparing percentiles to ECDF

Box-and-whisker plot

Variance and standard deviation

Computing the variance

The standard deviation and the variance

Covariance and the Pearson correlation coefficient

Scatter plots

Variance and covariance by looking

Computing the covariance

Computing the Pearson correlation coefficient

3

Thinking Probabilistically-- Discrete Variables

Statistical inference rests upon probability. Because we can very rarely say anything meaningful with absolute certainty from data, we use probabilistic language to make quantitative statements about data. In this chapter, you will learn how to think probabilistically about discrete quantities: those that can only take certain values, like integers.

Probabilistic logic and statistical inference

What is the goal of statistical inference?

Why do we use the language of probability?

Random number generators and hacker statistics

Generating random numbers using the np.random module

The np.random module and Bernoulli trials

How many defaults might we expect?

Will the bank fail?

Probability distributions and stories: The Binomial distribution

Sampling out of the Binomial distribution

Plotting the Binomial PMF

Poisson processes and the Poisson distribution

Relationship between Binomial and Poisson distributions

How many no-hitters in a season?

Was 2015 anomalous?

4

Thinking Probabilistically-- Continuous Variables

It’s time to move onto continuous variables, such as those that can take on any fractional value. Many of the principles are the same, but there are some subtleties. At the end of this final chapter, you will be speaking the probabilistic language you need to launch into the inference techniques covered in the sequel to this course.

Probability density functions

Interpreting PDFs

Interpreting CDFs

Introduction to the Normal distribution

The Normal PDF

The Normal CDF

The Normal distribution: Properties and warnings

Gauss and the 10 Deutschmark banknote

Are the Belmont Stakes results Normally distributed?

What are the chances of a horse matching or beating Secretariat's record?

The Exponential distribution

Matching a story and a distribution

Waiting for the next Secretariat

If you have a story, you can simulate it!

Distribution of no-hitters and cycles

Final thoughts

Statistical Thinking in Python (Part 1)

Course
Complete

Earn Statement of Accomplishment

Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance reviewEnroll Now

Don’t just take our word for it

*4.8

from 110 reviews

85%

15%

0%

0%

0%

Sort by

Abdelrahman Mahmoud

2 days ago

Rakhmat

2 weeks ago

Julian

4 weeks ago

Artur

5 weeks ago

Rogelio Jr.

2 months ago

Antoine

2 months ago

clear

Abdelrahman Mahmoud

Rakhmat

Artur

FAQs

Is this course for Python beginners or do I need prior experience?

You need intermediate Python skills including functions and the Python toolbox. This is an intermediate-level statistics course, not a Python introduction.

What topics are covered in the exploratory data analysis chapters?

You will learn graphical EDA through plotting, then quantitative EDA with summary statistics to describe key features of your datasets before moving to probability.

Does the course cover both discrete and continuous probability?

Yes. Chapter 3 covers probabilistic thinking for discrete variables like integers, and Chapter 4 extends these concepts to continuous variables with fractional values.

Will this course prepare me for statistical inference?

Yes. It builds the foundation of statistical thinking and probabilistic language you need to move into the inference techniques covered in Statistical Thinking in Python Part 2.

How long does this course typically take?

It has 4 chapters and 61 exercises. The median completion time is about 3.6 hours, though the estimated course length is 180 minutes.

Join over 19 million learners and start Statistical Thinking in Python (Part 1) today!

Grow your data skills with DataCamp for Mobile

Make progress on the go with our mobile courses and daily 5-minute coding challenges.