If you were a man applying to Berkeley's graduate school in 1973, you were almost twice as likely to be admitted as your female peers. On the surface, this seems to have been a flagrant case of gender discrimination. However, a closer inspection of the data reveals that women were more likely to apply to departments where the admission rate was lower overall, which was the true reason for any difference between the sexes.
The Berkeley problem is a classic example of Simpson's paradox – an important concept in statistics where an effect disappears or even reverses when you control for other factors. Knowledge of this concept can prove critical in areas such as education policy, human resources, or any other field where bias or discrimination is a concern.
Students should have a knowledge of common data structures in R, as taught through DataCamp's Introduction to R course, as well as some understanding of logistic regression, as taught through Multiple and Logistic Regression. Finally, they should have experience with the
tidyverse suite of packages, particularly
ggplot2, which can be acquired in Introduction to the Tidyverse.
Data Scientist at BBC
Joshua Feldman is a data scientist at the BBC, where he uses a host of machine learning techniques to answer business problems and help the organization better understand its audiences. He mainly codes in R and SQL, taking a specialist interest in computational text analysis and data visualization. He holds an MSc in quantitative research methodology from the London School of Economics.See More