Skip to main content
HomeCheat sheetsData Analysis

Importing Data in Python Cheat Sheet

With this Python cheat sheet, you'll have a handy reference guide to importing your data, from flat files to files native to other software and relational databases.
Jun 2021  · 5 min read

Before doing any data cleaning, wrangling, visualizing, ... You'll need to know how to get data into Python. As you know, there are many ways to import data into Python, depending also on which files you're dealing with.

However, you'll most often make use of the pandas and the NumPy libraries: The pandas library is one of the most preferred tools for data scientists to do data manipulation and analysis, next to matplotlib for data visualization and NumPy, the fundamental library for scientific computing in Python on which Pandas was built.

In this importing data in Python cheat sheet, you'll find some NumPy and pandas functions, together with functions that are built in the Python programming language, that will help you to get your data in Python fast!

This quick guide helps you to learn the basics of importing data in Python that you will need to get started on cleaning and wrangling your data!

data import python cheat sheet

Have this cheat sheet at your fingertips

Download PDF

The Importing Data in Python cheat sheet will guide you through the basics of getting your data in your workspace: you'll not only learn how to import flat files such as text files, but you'll also see how you can get data from files native to other software such as Excel spreadsheets, Stata, SAS and MATLAB files and relational databases. On top of that, you'll get some more information on how to ask for help, how to navigate your filesystem and how to start exploring your data.

In short, everything that you need to kickstart your data science learning with Python!

Do you want to learn more? Start the Importing Data in Python course for free now or try out our Python Excel tutorial!

Also, don't miss out on our Python cheat sheet for data science, or the many others we have here on our Community!

Importing Data in Python 

Most of the time, you'll use either NumPy or pandas to import your data:

>>> import numpy as np
>>> import pandas as pd


>>> help(pd.read_csv)

Text Files 

Plain Text Files 

>>>filename= 'huck_finn.txt'
>>>file= open(filename, mode='r') #Open the file for reading
>>>text= #Read a file's contents
>>> print(file.closed) #Check whether file is closed
>>> file.close() #Close file
>>> print( text)

Use the content manager with:

>>> with open('huck_finn.txt', 'r') as file: 
    print(file.readline()) #Read a single line 

Table Data: Flat Files 

Importing Flat Files with NumPy 

>>>filename= 'huck_finn.txt'
>>>file= open(filename, mode='r') #Open the file for reading
>>>text= #Read a file's contents
>>> print(file.closed) #Check whether file is closed
>>> file.close() #Close file
>>> print(text)

Files with one data type: 

>>>filename= 'mnist.txt'
>>>data= np.loadtxt(filename,
    delimiter=',', #String used to separate values 
    skiprows=2, #Skip the first 2 lines 
    usecols=[0,2], #Read the 1st and 3rd column 
    dtype=str) #The type of the resulting array

Files with mixed data type 

>>>filename= 'titanic.csv'
>>>data= np.genfromtxt(filename,
    names=True, #Look for column header
>>> data_array = np.recfromcsv(filename)
#The default dtype of the np.recfromcsv() function is None

Importing Flat Files with Pandas 

>>>filename= 'winequality-red.csv'
>>>data= pd.read_csv(filename,
         nrows=5, #Number of rows of file to read 
         header=None, #Row number to use as col names 
         sep='\t', #Delimiter  to use 
         comment='#', #Character to split comments 
         na_values=[""]) #String to recognize as NA/NaN

Exploring Your Data 

NumPy Arrays 

>>> data_array.dtype #Data type of array elements
>>> data_array.shape #Array  dimensions
>>> len(data_array) #Length of array

Pandas DataFrames 

>>> df.head() #Return first DataFrame rows
>>> df.tail() #Return last DataFrame rows
>>> df.index #Describe index
>>> df.columns #Describe DataFrame columns
>>> #Info an DataFrame
>>> data_array = data.values #Convert a DataFrame to an a NumPy array

SAS File 

>>> from sas7bdat import SAS7BDAT
>>> with SAS7BDAT('urbanpop.sas7bdat') as file: df_sas = file.to_data_frame()

Stata File 

>>>data= pd.read_stata('urbanpop.dta')

Excel Spreadsheets 

>>>file= 'urbanpop.xlsx'
>>>data= pd.ExcelFile(file)
>>> df sheet2 = data.parse('1960-1966',
          names=['Country', 'AAM: War(2002)'])
>>> df sheetl = data.parse(0,

To access the sheet names, use the sheet_names attribute:

>>> data.sheet_names

Relational Databases

>>> from sqlalchemy import create_engine
>>>engine= create_engine('sqlite://Northwind.sqlite')

Use the table_names() method to fetch a list of table names:

>>> table_names = engine.table_names()

Querying Relational Databases 

>>>con= engine.connect()
>>> rs= con.execute("SELECT * FROM Orders")
>>> df = pd.DataFrame(rs.fetchall())
>>> df.columns = rs.keys()
>>> con.close()

Using the context manager with

>>> with engine.connect() as con:
          rs= con.execute("SELECT OrderID FROM Orders") 
          df = pd.DataFrame(rs.fetchmany(size=5)) 
          df.columns = rs.keys()

Querying Relational Databases with Pandas 

>>> df = pd.read_sql_query("SELECT * FROM Orders", engine)

Pickled Files

>>> import pickle
>>> with open('pickled_fruit.pkl', 'rb') as file: 
          pickled_data = pickle.load(file)

Matlab Files 

>>> import
>>>filename= 'workspace.mat'

HDF5 Files 

>>> import h5py
>>>filename= 'H-Hl LDSC 4 v1-815411200-4096.hdf5'
>>>data= h5py.File(filename, 'r')

Exploring Dictionaries 

Querying relational databases with pandas

>>> print(mat.keys()) #Print dictionary keys
>>> for key in data.keys(): #Print dictionary keys
>>> pickled_data.values() #Return dictionary values
>>> print(mat.items()) #Returns items in list format of (key, value) tuple pairs

Accessing Data Items with Keys

>>> for key in data ['meta'].keys() #Explore the HDF5
#Retrieve the value for a key
>>> print(data['meta']['Description'].value)

Magic Commands 

!ls #List directory contents of files and directories
%cd .. #Change current working directory
%pwd #Return the current working directory path

OS Library 

>>> import os
>>> path = "/usr/tmp"
>>> wd = os.getcwd() #Store the name of current directory in a string
>>> os.listdir(wd) #Output contents of the directory in a list
>>> os.chdir(path) #Change current working directory
>>> os.rename( "test1.txt", #Rename a file
>>> os.remove("test1. txt") #Delete an existing file
>>> os.mkdir("newdir") #Create a new directory

Free Access Week | Nov 6 – Nov 12

Access DataCamp's entire platform for free, including all 450+ courses, for an entire week. No catch, no credit card required—just unlimited learning for anyone with internet access.
Will Rix's photo

Will Rix

5 min

The 6 Best Business Analyst Certifications: Your Path to Becoming Certified

Explore the top business analyst certifications to enhance your career. Learn about the benefits, preparation tips, and how DataCamp can support you.
Matt Crabtree's photo

Matt Crabtree

15 min

How to Overcome Challenges When Scaling Data Science Projects

Unlock the potential of your data science projects with our expert guide on overcoming scaling challenges.

John Marquez

12 min

NVIDIA Announces cuDF pandas Accelerator Mode

Discover how NVIDIA's new cuDF pandas Accelerator Mode can turbocharge your data manipulation tasks in Python. Learn how to get started, the benefits it offers, and how it simplifies high-performance pandas coding.
Richie Cotton's photo

Richie Cotton

8 min

Top 31 Business Analyst Interview Questions and Answers For All Levels

Explore common business analyst interview questions and their answers for all experience levels.
Austin Chia's photo

Austin Chia

18 min

Upgrading Company Culture Using The Geek Way with Andrew McAfee, Principal Research Scientist at the MIT Sloan School of Management

Adel and Andrew explore the four cultural norms of the Geekway, the evolutionary biological underpinnings of the traits high performing organizations exhibit, the role of data in driving high performance teams, and a lot more.
Adel Nehme's photo

Adel Nehme

61 min

See MoreSee More