Loved by learners at thousands of companies
Preparing for a Machine Learning (ML) interview could be quite challenging. Where to start? What topics to focus on? Theory or practice? Well, fear not! In this course, you will learn to answer 30 non-trivial questions that often pop up in ML interviews. These questions revolve around seven important topics: data preprocessing, data visualization, supervised learning, unsupervised learning, model ensembling, selection, and evaluation. You will practice these concepts while learning to predict the rating of an Android app or segmenting mall customers based on their purchasing behaviors. Keep in mind -- this course is meant to be more challenging than your average DataCamp course. Make sure to complete your prerequisite courses so you can gain the most out of the topics we will cover!
Data preprocessing and visualizationFree
This chapter discusses important topics related to data processing such as data normalization, handling missing data and identifying outliers.Data normalization50 xpUnderstanding when to normalize data100 xpNormalizing features100 xpHandling missing data50 xpExploring and summarizing missing data100 xpShow me your missingness100 xpImputing missing data100 xpEvaluating imputation models100 xpDetecting anomalies in data50 xpUnivariate outlier detection: the IQR rule100 xpThe KNN distance score100 xpThe LOF score100 xp
This chapter discusses important topics within supervised learning such as model interpretability, regularization, the bias-variance tradeoff and model ensembling.Interpretable models50 xpInterpreting linear regression100 xpInterpreting decision tree100 xpRegularization50 xpRidge regression100 xpLasso regression100 xpElastic net regression100 xpBias and variance50 xpBias-variance analysis100 xpReducing avoidable bias100 xpBuilding ensemble models50 xpRecruiting the base learners100 xpEvaluating base learners' performance100 xpStacking the base learners100 xpPredicting on test data100 xp
This chapter revolves around the most common types of unsupervised learning methods: clustering and dimensionality reduction via unsupervised feature selection and feature extraction.K-means clustering50 xpChecking K-means assumptions100 xpUsing the elbow method100 xpInterpreting your clustering results100 xpClustering algorithms50 xpComparing clustering methods: internal measures100 xpComparing clustering methods: stability measures100 xpVisualizing cluster prototypes100 xpFeature selection50 xpTypes of feature selection methods50 xpRemoving near-zero-variance features100 xpRemoving correlated features100 xpFeature extraction50 xpPCA to the rescue100 xpLDA to the rescue100 xp
Model selection and evaluation
This chapter covers topics related to model selection and evaluation, imbalanced classification and hyperparameter tuning . It also sheds light on the commonalities and differences between two top-performing ensemble models: Random Forests and Gradient Boosted Trees.Model evaluation50 xpEvaluating classification models100 xpEvaluating regression models100 xpEvaluating clustering methods100 xpHandling imbalanced data50 xpChecking for class imbalance100 xpApplying subsampling in each resample100 xpEvaluating model performance100 xpHyperparameter tuning50 xpDefault grid search in caret100 xpCustomizing the grid search100 xpRandom search100 xpRandom Forests or Gradient Boosted Trees?50 xpThe Random Forest model100 xpThe GBM model100 xpEvaluating GBM and RF predictions on test data100 xpYou made it!50 xp
Data Scientist at Shopify
Rafael works as a data scientist for Shopify. His goal is to help make commerce better for everyone by supporting data-informed decision making at large scale. Prior to joining Shopify, Rafael was working as a research scientist developing algorithms for multi-sensor data fusion, maritime domain awareness, risk management and decision support systems. He is passionate about all things data science and enjoys networking and learning about the cool things other people are building.
What do other learners have to say?
“I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.”
Devon Edwards Joseph
Lloyds Banking Group
“DataCamp is the top resource I recommend for learning data science.”
Harvard Business School
“DataCamp is by far my favorite website to learn from.”
Decision Science Analytics, USAA