Loved by learners at thousands of companies
As a data scientist, you will need to clean data, wrangle and munge it, visualize it, build predictive models, and interpret these models. Before you can do so, however, you will need to know how to get data into Python. In this course, you'll learn the many ways to import data into Python: from flat files such as .txt and .csv; from files native to other software such as Excel spreadsheets, Stata, SAS, and MATLAB files; and from relational databases such as SQLite and PostgreSQL.
Introduction and flat filesFree
In this chapter, you'll learn how to import data into Python from all types of flat files, which are a simple and prevalent form of data storage. You've previously learned how to use NumPy and pandas—you will learn how to use these packages to import flat files and customize your imports.Welcome to the course!50 xpExploring your working directory50 xpImporting entire text files100 xpImporting text files line by line100 xpThe importance of flat files in data science50 xpPop quiz: examples of flat files50 xpPop quiz: what exactly are flat files?50 xpWhy we like flat files and the Zen of Python50 xpImporting flat files using NumPy50 xpUsing NumPy to import flat files100 xpCustomizing your NumPy import100 xpImporting different datatypes100 xpWorking with mixed datatypes (1)50 xpWorking with mixed datatypes (2)100 xpImporting flat files using pandas50 xpUsing pandas to import flat files as DataFrames (1)100 xpUsing pandas to import flat files as DataFrames (2)100 xpCustomizing your pandas import100 xpFinal thoughts on data import50 xp
Importing data from other file types
You've learned how to import flat files, but there are many other file types you will potentially have to work with as a data scientist. In this chapter, you'll learn how to import data into Python from a wide array of important file types. These include pickled files, Excel spreadsheets, SAS and Stata files, HDF5 files, a file type for storing large quantities of numerical data, and MATLAB files.Introduction to other file types50 xpNot so flat any more50 xpLoading a pickled file100 xpListing sheets in Excel files100 xpImporting sheets from Excel files100 xpCustomizing your spreadsheet import100 xpImporting SAS/Stata files using pandas50 xpHow to import SAS7BDAT50 xpImporting SAS files100 xpUsing read_stata to import Stata files50 xpImporting Stata files100 xpImporting HDF5 files50 xpUsing File to import HDF5 files50 xpUsing h5py to import HDF5 files100 xpExtracting data from your HDF5 file100 xpImporting MATLAB files50 xpLoading .mat files100 xpThe structure of .mat in Python100 xp
Working with relational databases in Python
In this chapter, you'll learn how to extract meaningful data from relational databases, an essential skill for any data scientist. You will learn about relational models, how to create SQL queries, how to filter and order your SQL records, and how to perform advanced queries by joining database tables.Introduction to relational databases50 xpPop quiz: The relational model50 xpCreating a database engine in Python50 xpCreating a database engine100 xpWhat are the tables in the database?100 xpQuerying relational databases in Python50 xpThe Hello World of SQL Queries!100 xpCustomizing the Hello World of SQL Queries100 xpFiltering your database records using SQL's WHERE100 xpOrdering your SQL records with ORDER BY100 xpQuerying relational databases directly with pandas50 xpPandas and The Hello World of SQL Queries!100 xpPandas for more complex querying100 xpAdvanced querying: exploiting table relationships50 xpThe power of SQL lies in relationships between tables: INNER JOIN100 xpFiltering your INNER JOIN100 xpFinal Thoughts50 xp
DatasetsChinook (SQLite)LIGO (HDF5)Battledeath (XLSX)Extent of infectious diseases (DTA)Gene expressions (MATLAB)MNISTSales (SAS7BDAT)SeaslugsTitanic
Data Scientist at DataCamp
Hugo is a data scientist, educator, writer and podcaster at DataCamp. His main interests are promoting data & AI literacy, helping to spread data skills through organizations and society and doing amateur stand up comedy in NYC. If you want to know what he likes to talk about, definitely check out DataFramed, the DataCamp podcast, which he hosts and produces: https://www.datacamp.com/community/podcast
What do other learners have to say?
I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.
Devon Edwards Joseph
Lloyds Banking Group
DataCamp is the top resource I recommend for learning data science.
Harvard Business School
DataCamp is by far my favorite website to learn from.
Decision Science Analytics, USAA