
Loved by learners at thousands of companies
Course Description
As a Data Scientist, on a daily basis you will need to clean data, wrangle and munge it, visualize it, build predictive models and interpret these models. Before doing any of these, however, you will need to know how to get data into Python. In this course, you'll learn the many ways to import data into Python: (i) from flat files such as .txts and .csvs; (ii) from files native to other software such as Excel spreadsheets, Stata, SAS and MATLAB files; (iii) from relational databases such as SQLite & PostgreSQL; (iv) from the web and (v) a special and essential case of this: pulling data from Application Programming Interfaces, also known as APIs, such as the Twitter streaming API, which allows us to stream real-time tweets.
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.- 1
Introduction and flat files
FreeIn this chapter, you'll learn how to import data into Python from all types of flat files, a simple and prevalent form of data storage. You've previously learned how to use NumPy and Pandas - you will learn how to use these packages to import flat files, as well as how to customize your imports.
Welcome to the course!50 xpExploring your working directory50 xpImporting entire text files100 xpImporting text files line by line100 xpThe importance of flat files in data science50 xpPop quiz: examples of flat files50 xpPop quiz: what exactly are flat files?50 xpWhy we like flat files and the Zen of Python50 xpImporting flat files using NumPy50 xpUsing NumPy to import flat files100 xpCustomizing your NumPy import100 xpImporting different datatypes100 xpWorking with mixed datatypes (1)50 xpWorking with mixed datatypes (2)100 xpImporting flat files using pandas50 xpUsing pandas to import flat files as DataFrames (1)100 xpUsing pandas to import flat files as DataFrames (2)100 xpCustomizing your pandas import100 xpFinal thoughts on data import50 xp - 2
Importing data from other file types
You've learned how to import flat files, but there are many other file types you will potentially have to work with as a data scientist. In this chapter, you'll learn how to import data into Python from a wide array of important file types. You will be importing file types such as pickled files, Excel spreadsheets, SAS and Stata files, HDF5 files, a file type for storing large quantities of numerical data, and MATLAB files.
Introduction to other file types50 xpNot so flat any more50 xpLoading a pickled file100 xpListing sheets of Excel spreadsheets100 xpImporting sheets of Excel spreadsheets100 xpCustomizing your spreadsheet import100 xpImporting SAS/Stata files using pandas50 xpHow to import SAS7BDAT50 xpImporting SAS files100 xpUsing read_stata to import Stata files50 xpImporting Stata files100 xpImporting HDF5 files50 xpUsing File to import HDF5 files50 xpUsing h5py to import HDF5 files100 xpExtracting data from your HDF5 file100 xpImporting MATLAB files50 xpLoading .mat files100 xpThe structure of .mat in Python100 xp - 3
Working with relational databases in Python
In this chapter, you'll learn how to extract meaningful data from relational databases, an essential element of any data scientist's toolkit. You will be learning about the relational model, creating SQL queries, filtering and ordering your SQL records, and advanced querying by JOINing database tables.
Introduction to relational databases50 xpPop quiz: The relational model50 xpCreating a database engine100 xpWhat are the tables in the database?100 xpQuerying relational databases in Python50 xpThe Hello World of SQL Queries!100 xpCustomizing the Hello World of SQL Queries100 xpFiltering your database records using SQL's WHERE100 xpOrdering your SQL records with ORDER BY100 xpQuerying relational databases directly with pandas50 xpPandas and The Hello World of SQL Queries!100 xpPandas for more complex querying100 xpAdvanced Querying: exploiting table relationships50 xpThe power of SQL lies in relationships between tables: INNER JOIN100 xpFiltering your INNER JOIN100 xpFinal Thoughts50 xp - 4
Importing data from the Internet
The web is a rich source of data from which you can extract various types of insights and findings. In this chapter, you will learn how to get data from the web, whether it be stored in files or in HTML. You'll also learn the basics of scraping and parsing web data.
Importing flat files from the web50 xpImporting flat files from the web: your turn!100 xpOpening and reading flat files from the web100 xpImporting non-flat files from the web100 xpHTTP requests to import files from the web50 xpPerforming HTTP requests in Python using urllib100 xpPrinting HTTP request results in Python using urllib100 xpPerforming HTTP requests in Python using requests100 xpScraping the web in Python50 xpParsing HTML with BeautifulSoup100 xpTurning a webpage into data using BeautifulSoup: getting the text100 xpTurning a webpage into data using BeautifulSoup: getting the hyperlinks100 xp - 5
Interacting with APIs to import data from the web
In this chapter, you will push further on your knowledge of importing data from the web. You will learn the basics of extracting data from APIs, gain insight on the importance of APIs and practice getting data from them with dives into the OMDB, Wikipedia and Twitter APIs.
Introduction to APIs and JSONs50 xpPop quiz: What exactly is a JSON?50 xpLoading and exploring a JSON100 xpPop quiz: Exploring your JSON50 xpAPIs and interacting with the world wide web50 xpPop quiz: What's an API?50 xpAPI requests100 xpJSON- from the web to Python100 xpChecking out the Wikipedia API100 xpThe Twitter API and Authentication50 xpAPI Authentication100 xpStreaming tweets100 xpLoad and explore your Twitter data100 xpTwitter data to DataFrame100 xpA little bit of Twitter text analysis100 xpPlotting your Twitter data100 xpFinal Thoughts50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.Data Scientist at DataCamp
Hugo hearts all things Pythonic and is charged with building out
DataCamp’s Python curriculum. He can be found at hackathons, meetups & code
sprints, primarily in NYC. Before joining the ranks of DataCamp, he worked in
applied mathematics (biology) research at Yale University.
Data Scientist
Hugo is a data scientist, educator, writer and podcaster formerly at DataCamp. His main interests are promoting data & AI literacy, helping to spread data skills through organizations and society and doing amateur stand up comedy in NYC. If you want to know what he likes to talk about, definitely check out DataFramed, the DataCamp podcast, which he hosted and produced.
Join over 19 million learners and start Importing Data in Python today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.