Skip to main content

Course

Statistical Thinking in Python (Part 2)

IntermediateSkill Level

4.7+

Updated 07/2024

Learn to perform the two key tasks in statistical inference: parameter estimation and hypothesis testing.

Start Course for Free

PythonProbability & Statistics

4 hr

15 videos

66 Exercises

5,350 XP

93,567

Statement of Accomplishment

Loved by learners at thousands of companies

Training a Team?

Try for Business

Course Description

After completing Statistical Thinking in Python (Part 1), you have the probabilistic mindset and foundational hacker stats skills to dive into data sets and extract useful information from them. In this course, you will do just that, expanding and honing your hacker stats toolbox to perform the two key tasks in statistical inference, parameter estimation and hypothesis testing. You will work with real data sets as you learn, culminating with analysis of measurements of the beaks of the Darwin's famous finches. You will emerge from this course with new knowledge and lots of practice under your belt, ready to attack your own inference problems out in the world.

Prerequisites

Statistical Thinking in Python (Part 1)

1

Parameter estimation by optimization

When doing statistical inference, we speak the language of probability. A probability distribution that describes your data has parameters. So, a major goal of statistical inference is to estimate the values of these parameters, which allows us to concisely and unambiguously describe our data and draw conclusions from it. In this chapter, you will learn how to find the optimal parameters, those that best describe your data.

Optimal parameters

How often do we get no-hitters?

Do the data follow our story?

How is this parameter optimal?

Linear regression by least squares

EDA of literacy/fertility data

Linear regression

How is it optimal?

The importance of EDA: Anscombe's quartet

The importance of EDA

Linear regression on appropriate Anscombe data

Linear regression on all Anscombe data

2

Bootstrap confidence intervals

To "pull yourself up by your bootstraps" is a classic idiom meaning that you achieve a difficult task by yourself with no help at all. In statistical inference, you want to know what would happen if you could repeat your data acquisition an infinite number of times. This task is impossible, but can we use only the data we actually have to get close to the same result as an infinitude of experiments? The answer is yes! The technique to do it is aptly called bootstrapping. This chapter will introduce you to this extraordinarily powerful tool.

Generating bootstrap replicates

Getting the terminology down

Bootstrapping by hand

Visualizing bootstrap samples

Bootstrap confidence intervals

Generating many bootstrap replicates

Bootstrap replicates of the mean and the SEM

Confidence intervals of rainfall data

Bootstrap replicates of other statistics

Confidence interval on the rate of no-hitters

Pairs bootstrap

A function to do pairs bootstrap

Pairs bootstrap of literacy/fertility data

Plotting bootstrap regressions

3

Introduction to hypothesis testing

You now know how to define and estimate parameters given a model. But the question remains: how reasonable is it to observe your data if a model is true? This question is addressed by hypothesis tests. They are the icing on the inference cake. After completing this chapter, you will be able to carefully construct and test hypotheses using hacker statistics.

Formulating and simulating a hypothesis

Generating a permutation sample

Visualizing permutation sampling

Test statistics and p-values

Test statistics

What is a p-value?

Generating permutation replicates

Look before you leap: EDA before hypothesis testing

Permutation test on frog data

Bootstrap hypothesis tests

A one-sample bootstrap hypothesis test

A two-sample bootstrap hypothesis test for difference of means

4

Hypothesis test examples

As you saw from the last chapter, hypothesis testing can be a bit tricky. You need to define the null hypothesis, figure out how to simulate it, and define clearly what it means to be "more extreme" in order to compute the p-value. Like any skill, practice makes perfect, and this chapter gives you some good practice with hypothesis tests.

A/B testing

The vote for the Civil Rights Act in 1964

What is equivalent?

A time-on-website analog

What should you have done first?

Test of correlation

Simulating a null hypothesis concerning correlation

Hypothesis test on Pearson correlation

Do neonicotinoid insecticides have unintended consequences?

Bootstrap hypothesis test on bee sperm counts

5

Putting it all together: a case study

Every year for the past 40-plus years, Peter and Rosemary Grant have gone to the Galápagos island of Daphne Major and collected data on Darwin's finches. Using your skills in statistical inference, you will spend this chapter with their data, and witness first hand, through data, evolution in action. It's an exhilarating way to end the course!

Finch beaks and the need for statistics

EDA of beak depths of Darwin's finches

ECDFs of beak depths

Parameter estimates of beak depths

Hypothesis test: Are beaks deeper in 2012?

Variation in beak shapes

EDA of beak length and depth

Linear regressions

Displaying the linear regression results

Beak length to depth ratio

How different is the ratio?

Calculation of heritability

EDA of heritability

Correlation of offspring and parental data

Pearson correlation of offspring and parental data

Measuring heritability

Is beak depth heritable at all in G. scandens?

Final thoughts

Statistical Thinking in Python (Part 2)

Course
Complete

Earn Statement of Accomplishment

Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance reviewEnroll Now

Don’t just take our word for it

*4.7

from 252 reviews

83%

15%

2%

0%

0%

Sort by

Abdelrahman Mahmoud

15 hours ago

Ahmed

2 days ago

Rogelio Jr.

last week

N/A

Lorenzo

3 weeks ago

Very good this one.! Thank you

Sohrab

3 weeks ago

Thiago de Mello

3 weeks ago

Ahmed

"N/A"

Rogelio Jr.

"Very good this one.! Thank you"

Lorenzo

FAQs

Do I need to complete Part 1 before taking this course?

Yes. Statistical Thinking in Python Part 1 is a prerequisite. This course builds directly on the probabilistic mindset and hacker stats skills introduced there.

What real-world dataset is used in the final case study?

The course concludes with analysis of beak measurements from Darwin's finches collected over 40 years on the Galapagos island of Daphne Major, demonstrating evolution through data.

What are the two main statistical inference tasks this course covers?

Parameter estimation and hypothesis testing. You learn to estimate distribution parameters using optimization and bootstrapping, and to test hypotheses using permutation and simulation.

What is bootstrapping and why is it important?

Bootstrapping is a resampling technique that lets you approximate what would happen with infinite data by repeatedly sampling from your existing dataset. Chapter 2 teaches you to build bootstrap confidence intervals.

How many chapters and exercises does this course have?

It has 5 chapters with 66 exercises and over 5,350 XP. Most learners spend about 5 hours completing it, making it one of the more substantial courses in the statistics track.

Join over 19 million learners and start Statistical Thinking in Python (Part 2) today!

Grow your data skills with DataCamp for Mobile

Make progress on the go with our mobile courses and daily 5-minute coding challenges.