Skip to main content
Shen (Sean) Chen avatar

Shen (Sean) Chen has completed

Data Processing in Shell

Start course For Free
4 hr
3,550 XP
Statement of Accomplishment Badge

Loved by learners at thousands of companies


Course Description

We live in a busy world with tight deadlines. As a result, we fall back on what is familiar and easy, favoring GUI interfaces like Visual Studio and RStudio. However, taking the time to learn data analysis on the command line is a great long-term investment because it makes us stronger and more productive data people.In this course, we will take a practical approach to learn simple, powerful, and data-specific command-line skills. Using publicly available Spotify datasets, we will learn how to download, process, clean, and transform data, all via the command line. We will also learn advanced techniques such as command-line based SQL database operations. Finally, we will combine the powers of command line and Python to build a data pipeline for automating a predictive model.
For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.
DataCamp for BusinessFor a bespoke solution book a demo.
  1. 1

    Downloading Data on the Command Line

    Free

    In this chapter, we learn how to download data files from web servers via the command line. In the process, we also learn about documentation manuals, option flags, and multi-file processing.

    Play Chapter Now
    Downloading data using curl
    50 xp
    Using curl documentation
    50 xp
    Downloading single file using curl
    100 xp
    Downloading multiple files using curl
    100 xp
    Downloading data using Wget
    50 xp
    Installing Wget
    50 xp
    Downloading single file using wget
    100 xp
    Advanced downloading using Wget
    50 xp
    Setting constraints for multiple file downloads
    50 xp
    Creating wait time using Wget
    100 xp
    Data downloading with Wget and curl
    100 xp
  2. 4

    Data Pipeline on the Command Line

    In the last chapter, we bridge the connection between command line and other data science languages and learn how they can work together. Using Python as a case study, we learn to execute Python on the command line, to install dependencies using the package manager pip, and to build an entire model pipeline using the command line.

    Play Chapter Now
For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

datasets

Spotify Songs Popularity RankingSpotify Song Attributes

collaborators

Collaborator's avatar
Adrián Soto
Collaborator's avatar
Hillary Green-Lerman
Susan Sun HeadshotSusan Sun

Data Freelancer

Over 15 years of experience in data science and data engineering in health tech, civic tech, and education.
See More

Join over 18 million learners and start Data Processing in Shell today!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.