Skip to main content

Importing Data in Python Cheat Sheet

With this Python cheat sheet, you'll have a handy reference guide to importing your data, from flat files to files native to other software, and relational databases.
Jun 2021  · 5 min read

Before doing any data cleaning, wrangling, visualizing, ... You'll need to know how to get data into Python. As you know, there are many ways to import data into Python, depending also on which files you're dealing with.

However, you'll most often make use of the pandas and the NumPy libraries: The pandas library is one of the most preferred tools for data scientists to do data manipulation and analysis, next to matplotlib for data visualization and NumPy, the fundamental library for scientific computing in Python on which Pandas was built.

In this importing data in Python cheat sheet, you'll find some NumPy and pandas functions, together with functions that are built in the Python programming language, that will help you to get your data in Python fast!

This quick guide helps you to learn the basics of importing data in Python that you will need to get started on cleaning and wrangling your data!

data import python cheat sheet

Have this cheat sheet at your fingertips

Download PDF

The Importing Data in Python cheat sheet will guide you through the basics of getting your data in your workspace: you'll not only learn how to import flat files such as text files, but you'll also see how you can get data from files native to other software such as Excel spreadsheets, Stata, SAS and MATLAB files and relational databases. On top of that, you'll get some more information on how to ask for help, how to navigate your filesystem and how to start exploring your data.

In short, everything that you need to kickstart your data science learning with Python!

Do you want to learn more? Start the Importing Data in Python course for free now or try out our Python Excel tutorial!

Also, don't miss out on our Python cheat sheet for data science, or the many others we have here on our Community!

Importing Data in Python 

Most of the time, you'll use either NumPy or pandas to import your data:

>>> import numpy as np
>>> import pandas as pd

Help 

>>> np.info(np.ndarray.dtype)
>>> help(pd.read_csv)

Text Files 

Plain Text Files 

>>>filename= 'huck_finn.txt'
>>>file= open(filename, mode='r') #Open the file for reading
>>>text= file.read() #Read a file's contents
>>> print(file.closed) #Check whether file is closed
>>> file.close() #Close file
>>> print( text)

Use the content manager with:

>>> with open('huck_finn.txt', 'r') as file: 
    print(file.readline()) #Read a single line 
    print(file.readline()) 
    print(file.readline())

Table Data: Flat Files 

Importing Flat Files with NumPy 

>>>filename= 'huck_finn.txt'
>>>file= open(filename, mode='r') #Open the file for reading
>>>text= file.read() #Read a file's contents
>>> print(file.closed) #Check whether file is closed
>>> file.close() #Close file
>>> print(text)

Files with one data type: 

>>>filename= 'mnist.txt'
>>>data= np.loadtxt(filename,
    delimiter=',', #String used to separate values 
    skiprows=2, #Skip the first 2 lines 
    usecols=[0,2], #Read the 1st and 3rd column 
    dtype=str) #The type of the resulting array

Files with mixed data type 

>>>filename= 'titanic.csv'
>>>data= np.genfromtxt(filename,
    delimiter=',',
    names=True, #Look for column header
    dtype=None)
>>> data_array = np.recfromcsv(filename)
#The default dtype of the np.recfromcsv() function is None

Importing Flat Files with Pandas 

>>>filename= 'winequality-red.csv'
>>>data= pd.read_csv(filename,
         nrows=5, #Number of rows of file to read 
         header=None, #Row number to use as col names 
         sep='\t', #Delimiter  to use 
         comment='#', #Character to split comments 
         na_values=[""]) #String to recognize as NA/NaN

Exploring Your Data 

NumPy Arrays 

>>> data_array.dtype #Data type of array elements
>>> data_array.shape #Array  dimensions
>>> len(data_array) #Length of array

Pandas DataFrames 

>>> df.head() #Return first DataFrame rows
>>> df.tail() #Return last DataFrame rows
>>> df.index #Describe index
>>> df.columns #Describe DataFrame columns
>>> df.info() #Info an DataFrame
>>> data_array = data.values #Convert a DataFrame to an a NumPy array

SAS File 

>>> from sas7bdat import SAS7BDAT
>>> with SAS7BDAT('urbanpop.sas7bdat') as file: df_sas = file.to_data_frame()

Stata File 

>>>data= pd.read_stata('urbanpop.dta')

Excel Spreadsheets 

>>>file= 'urbanpop.xlsx'
>>>data= pd.ExcelFile(file)
>>> df sheet2 = data.parse('1960-1966',
          skiprows=[0], 
          names=['Country', 'AAM: War(2002)'])
>>> df sheetl = data.parse(0,
          parse_cols=[0], 
          skiprows=[0], 
          names=['Country'])

To access the sheet names, use the sheet_names attribute:

>>> data.sheet_names

Relational Databases

>>> from sqlalchemy import create_engine
>>>engine= create_engine('sqlite://Northwind.sqlite')

Use the table_names() method to fetch a list of table names:

>>> table_names = engine.table_names()

Querying Relational Databases 

>>>con= engine.connect()
>>> rs= con.execute("SELECT * FROM Orders")
>>> df = pd.DataFrame(rs.fetchall())
>>> df.columns = rs.keys()
>>> con.close()

Using the context manager with

>>> with engine.connect() as con:
          rs= con.execute("SELECT OrderID FROM Orders") 
          df = pd.DataFrame(rs.fetchmany(size=5)) 
          df.columns = rs.keys()

Querying Relational Databases with Pandas 

>>> df = pd.read_sql_query("SELECT * FROM Orders", engine)

Pickled Files

>>> import pickle
>>> with open('pickled_fruit.pkl', 'rb') as file: 
          pickled_data = pickle.load(file)

Matlab Files 

>>> import scipy.io
>>>filename= 'workspace.mat'
>>>mat= scipy.io.loadmat(filename)

HDF5 Files 

>>> import h5py
>>>filename= 'H-Hl LDSC 4 v1-815411200-4096.hdf5'
>>>data= h5py.File(filename, 'r')

Exploring Dictionaries 

Querying relational databases with pandas

>>> print(mat.keys()) #Print dictionary keys
>>> for key in data.keys(): #Print dictionary keys
              print(key)
     meta 
     quality 
     strain
>>> pickled_data.values() #Return dictionary values
>>> print(mat.items()) #Returns items in list format of (key, value) tuple pairs

Accessing Data Items with Keys

>>> for key in data ['meta'].keys() #Explore the HDF5
structure
               print(key) 
     Description 
     DescriptionURL 
     Detector
     Duration 
     GPSstart 
     Observatory 
     Type 
     UTCstart
#Retrieve the value for a key
>>> print(data['meta']['Description'].value)

Magic Commands 

!ls #List directory contents of files and directories
%cd .. #Change current working directory
%pwd #Return the current working directory path

OS Library 

>>> import os
>>> path = "/usr/tmp"
>>> wd = os.getcwd() #Store the name of current directory in a string
>>> os.listdir(wd) #Output contents of the directory in a list
>>> os.chdir(path) #Change current working directory
>>> os.rename( "test1.txt", #Rename a file
               "test2.txt")
>>> os.remove("test1. txt") #Delete an existing file
>>> os.mkdir("newdir") #Create a new directory
Related
Data Science Concept Vector Image

How to Become a Data Scientist in 8 Steps

Find out everything you need to know about becoming a data scientist, and find out whether it’s the right career for you!
Jose Jorge Rodriguez Salgado's photo

Jose Jorge Rodriguez Salgado

12 min

How to Become a Data Analyst in 2023: 5 Steps to Start Your Career

Learn how to become a data analyst and discover everything you need to know about launching your career, including the skills you need and how to learn them.
Elena Kosourova 's photo

Elena Kosourova

18 min

DC Data in Soccer Infographic.png

How Data Science is Changing Soccer

With the Fifa 2022 World Cup upon us, learn about the most widely used data science use-cases in soccer.
Richie Cotton's photo

Richie Cotton

Sports Analytics: How Different Sports Use Data Analytics

Discover how sports analytics works and how different sports use data to provide meaningful insights. Plus, discover what it takes to become a sports data analyst.
Kurtis Pykes 's photo

Kurtis Pykes

13 min

The 23 Top Python Interview Questions & Answers

Essential Python interview questions with examples for job seekers, final-year students, and data professionals.
Abid Ali Awan's photo

Abid Ali Awan

22 min

Getting started with Python cheat sheet

Python is the most popular programming language in data science. Use this cheat sheet to jumpstart your Python learning journey.
DataCamp Team's photo

DataCamp Team

8 min

See MoreSee More