Napoleon-Christos Oikonomou has completed
Statistical Thinking in Python (Part 2)
Start course For Free4 hours
5,350 XP
Loved by learners at thousands of companies
Course Description
After completing Statistical Thinking in Python (Part 1), you have the probabilistic mindset and foundational hacker stats skills to dive into data sets and extract useful information from them. In this course, you will do just that, expanding and honing your hacker stats toolbox to perform the two key tasks in statistical inference, parameter estimation and hypothesis testing. You will work with real data sets as you learn, culminating with analysis of measurements of the beaks of the Darwin's famous finches. You will emerge from this course with new knowledge and lots of practice under your belt, ready to attack your own inference problems out in the world.
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.- 1
Parameter estimation by optimization
FreeWhen doing statistical inference, we speak the language of probability. A probability distribution that describes your data has parameters. So, a major goal of statistical inference is to estimate the values of these parameters, which allows us to concisely and unambiguously describe our data and draw conclusions from it. In this chapter, you will learn how to find the optimal parameters, those that best describe your data.
Optimal parameters50 xpHow often do we get no-hitters?100 xpDo the data follow our story?100 xpHow is this parameter optimal?100 xpLinear regression by least squares50 xpEDA of literacy/fertility data100 xpLinear regression100 xpHow is it optimal?100 xpThe importance of EDA: Anscombe's quartet50 xpThe importance of EDA50 xpLinear regression on appropriate Anscombe data100 xpLinear regression on all Anscombe data100 xp - 2
Bootstrap confidence intervals
To "pull yourself up by your bootstraps" is a classic idiom meaning that you achieve a difficult task by yourself with no help at all. In statistical inference, you want to know what would happen if you could repeat your data acquisition an infinite number of times. This task is impossible, but can we use only the data we actually have to get close to the same result as an infinitude of experiments? The answer is yes! The technique to do it is aptly called bootstrapping. This chapter will introduce you to this extraordinarily powerful tool.
Generating bootstrap replicates50 xpGetting the terminology down50 xpBootstrapping by hand50 xpVisualizing bootstrap samples100 xpBootstrap confidence intervals50 xpGenerating many bootstrap replicates100 xpBootstrap replicates of the mean and the SEM100 xpConfidence intervals of rainfall data50 xpBootstrap replicates of other statistics100 xpConfidence interval on the rate of no-hitters100 xpPairs bootstrap50 xpA function to do pairs bootstrap100 xpPairs bootstrap of literacy/fertility data100 xpPlotting bootstrap regressions100 xp - 3
Introduction to hypothesis testing
You now know how to define and estimate parameters given a model. But the question remains: how reasonable is it to observe your data if a model is true? This question is addressed by hypothesis tests. They are the icing on the inference cake. After completing this chapter, you will be able to carefully construct and test hypotheses using hacker statistics.
Formulating and simulating a hypothesis50 xpGenerating a permutation sample100 xpVisualizing permutation sampling100 xpTest statistics and p-values50 xpTest statistics50 xpWhat is a p-value?50 xpGenerating permutation replicates100 xpLook before you leap: EDA before hypothesis testing100 xpPermutation test on frog data100 xpBootstrap hypothesis tests50 xpA one-sample bootstrap hypothesis test100 xpA two-sample bootstrap hypothesis test for difference of means100 xp - 4
Hypothesis test examples
As you saw from the last chapter, hypothesis testing can be a bit tricky. You need to define the null hypothesis, figure out how to simulate it, and define clearly what it means to be "more extreme" in order to compute the p-value. Like any skill, practice makes perfect, and this chapter gives you some good practice with hypothesis tests.
A/B testing50 xpThe vote for the Civil Rights Act in 1964100 xpWhat is equivalent?50 xpA time-on-website analog100 xpWhat should you have done first?50 xpTest of correlation50 xpSimulating a null hypothesis concerning correlation50 xpHypothesis test on Pearson correlation100 xpDo neonicotinoid insecticides have unintended consequences?100 xpBootstrap hypothesis test on bee sperm counts100 xp - 5
Putting it all together: a case study
Every year for the past 40-plus years, Peter and Rosemary Grant have gone to the Galápagos island of Daphne Major and collected data on Darwin's finches. Using your skills in statistical inference, you will spend this chapter with their data, and witness first hand, through data, evolution in action. It's an exhilarating way to end the course!
Finch beaks and the need for statistics50 xpEDA of beak depths of Darwin's finches100 xpECDFs of beak depths100 xpParameter estimates of beak depths100 xpHypothesis test: Are beaks deeper in 2012?100 xpVariation in beak shapes50 xpEDA of beak length and depth100 xpLinear regressions100 xpDisplaying the linear regression results100 xpBeak length to depth ratio100 xpHow different is the ratio?50 xpCalculation of heritability50 xpEDA of heritability100 xpCorrelation of offspring and parental data100 xpPearson correlation of offspring and parental data100 xpMeasuring heritability100 xpIs beak depth heritable at all in G. scandens?100 xpFinal thoughts50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.datasets
Anscombe dataBee sperm countsFemale literacy and fertilityFinch beaks (1975)Finch beaks (2012)Fortis beak depth heredityFrog tongue dataMajor League Baseball no-hittersScandens beak depth hereditySheffield Weather Stationcollaborators
prerequisites
Statistical Thinking in Python (Part 1)Justin Bois
See MoreLecturer at the California Institute of Technology
Justin Bois is a Teaching Professor in the Division of Biology and Biological Engineering at the California Institute of Technology. He teaches nine different classes there, nearly all of which heavily feature Python. He is dedicated to empowering students in the biological sciences with quantitative tools, particularly data analysis skills. Beyond biologists, he is thrilled to develop courses for DataCamp, whose students are an excited bunch of burgeoning data scientists!
Join over 15 million learners and start Statistical Thinking in Python (Part 2) today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.