12 Useful Data Science Walkthroughs
So you’ve developed some base skills in programming, data visualization, data manipulation etc... And are looking for ways to apply those skills and build a data science portfolio?
We’re here to help.
Practicing your skills with concrete examples will boost your data science confidence and will help you identify and solve problems in the real world. For this reason, we’ve made a collection of high-quality walkthroughs ranging from Text Mining, ML, Deep Learning, Finance and more.
Check it out and let us know your favorite!
Text Mining in R
- In this 3-part tutorial, you will learn how to scrape H-1B visa data with R. DataCamp instructor Ted Kwartler walks you through how to parse and store the JSON data, perform Exploratory Data Analysis, adding visuals, and finally create a map of the data thanks to a geocoding API. This walkthrough is valuable as it shows all the steps a data scientist would take to answer a question: Can Data Help Your H-1B Visa Application?
- Characterizing Twitter followers with tidytext - Explore tidytext in this walkthrough by analyzing your Twitter followers’ descriptions to learn more about them.
Data Mining (Python)
Introduction to Market Basket Analysis in Python - learn how to use market basket analysis to find common patterns of items in large datasets. This walkthrough showcases this technique on a large online retail data set to try to find interesting purchase combinations.
Machine Learning (ML) is increasingly becoming essential in a data scientist’s toolbox for both R and Python. Advances in ML are a big reason why data science has become such an in-demand skill. These 3 walkthroughs below show you how to use scikit-learn (Python) and Caret (R) along with a series of Machine Learning techniques.
- Python Machine Learning: Scikit-Learn Tutorial - This introductory post covers the basics of scikit-learn using digits data. The techniques covered here are Principal Component Analysis (PCA), Support Vector Machines (SVM), and K-Means algorithms.
- Scikit-Learn Tutorial: Baseball Analytics - This 2-part walkthrough uses baseball datasets to determine Major League Baseball (MLB) Teams wins per season based on team statistics, and which player will be voted into the Hall of Fame based on career statistics and awards. The techniques covered here are Linear Regression, K-Means, Logistic Regression, and Random Forest.
- Machine Learning in R For Beginners - This includes a walkthrough on multi-class classification with the well-known k-nearest neighbor algorithm with the help of the caret library. This short introduction to ML in R is a must for R learners and the data used here is the famous iris dataset.
Building a Classifier
- What I learned From Implementing A Classifier From Scratch - This is a great walkthrough to understand what is under the hood of Machine Learning. Without using a pre-existing library, build a classifier from scratch to better understand its inner workings.
- Forecasting Website Traffic Using Facebook’s Prophet Library - Facebook open-sourced an R and Python library called prophet to automate the forecasting process. This walkthrough introduces this library and uses it to predict traffic volume for a website.
Even more so than Machine Learning, Deep Learning gets all the attention in the data science world. Companies are investing in infrastructure and talent to take advantage of this new field. To become an elite data scientist, Deep Learning is a must.
Keras (R + Python)
- Keras Tutorial: Deep Learning in Python - Build a Multi-Layer Perceptron (MLP) for classification and regression tasks using a wine data set.
- keras: Deep Learning in R - The Keras package was recently launched in R, be an early adopter! Here you will build a MLP for multi-class classification again using the iris dataset.
- TensorFlow Tutorial For Beginners (Python) - Work on Belgian traffic signs data with Google’s very own TensorFlow, one of the more promising deep learning libraries.
Python For Finance: Algorithmic Trading - Perform financial analysis, develop a trading strategy, and backtest it using Quantopian in this popular walkthrough.
For more data science content, create a free DataCamp account to receive a newsletter every Tuesday with the best data science news and projects!