- 18 Videos
- 61 Exercises
- 3 hours
- 3,752 Participants
- 4550 XP

**Instructor(s):**

Justin Bois is a lecturer in the Division of Biology and Biological Engineering at the California Institute of Technology. He teaches nine different classes there, nearly all of which heavily feature Python. He is dedicated to empowering students in the biological sciences with quantitative tools, particularly data analysis skills. Beyond biologists, he is thrilled to develop courses for DataCamp, whose students are an excited bunch of burgeoning data scientists!

Hugo Bowne-Anderson

Vincent Lan

Yashas Roy

After all of the hard work of acquiring data and getting them into a form you can work with, you ultimately want to make clear, succinct conclusions from them. This crucial last step of a data analysis pipeline hinges on the principles of statistical inference. In this course, you will start building the foundation you need to think statistically, to speak the language of your data, to understand what they are telling you. The foundations of statistical thinking took decades upon decades to build, but they can be grasped much faster today with the help of computers. With the power of Python-based tools, you will rapidly get up to speed and begin thinking statistically by the end of this course.

Look before you leap! A very important proverb, indeed. Prior to diving in headlong into sophisticated statistical inference techniques, you should first explore your data by plotting them and computing simple summary statistics. This process, called exploratory data analysis, is a crucial first step in statistical analysis of data. So it is a fitting subject for the first chapter of Statistical Thinking in Python.

- Introduction to exploratory data analysis 50 xp
- Tukey's comments on EDA 50 xp
- Advantages of graphical EDA 50 xp
- Plotting a histogram 50 xp
- Plotting a histogram of iris data 100 xp
- Axis labels! 100 xp
- Adjusting the number of bins in a histogram 100 xp
- Plotting all of your data: Bee swarm plots 50 xp
- Bee swarm plot 100 xp
- Interpreting a bee swarm plot 50 xp
- Plotting all of your data: Empirical cumulative distribution functions 50 xp
- Computing the ECDF 100 xp
- Plotting the ECDF 100 xp
- Comparison of ECDFs 100 xp
- Onward toward the whole story 50 xp

In the last chapter, you learned how to graphically explore data. In this chapter, you will compute useful summary statistics, which serve to concisely describe salient features of a data set with a few numbers.

Statistical inference rests upon probability. Because we can very rarely say anything meaningful with absolute certainty from data, we use probabilistic language to make quantitative statements about data. In this chapter, you will learn how to think probabilistically about discrete quantities, those that can only take certain values, like integers. It is an important first step in building the probabilistic language necessary to think statistically.

In the last chapter, you learned about probability distributions of discrete variables. Now it is time to move on to continuous variables, such as those that can take on any fractional value. Many of the principles are the same, but there are some subtleties. At the end of this last chapter of the course, you will be speaking the probabilistic language you need to launch into the inference techniques covered in the sequel to this course.