Pandas Cheat Sheet: Data Wrangling in Python

This cheat sheet is a quick reference for data wrangling with Pandas, complete with code samples.

24 de jun. de 2021 · 4 min lido

By now, you'll already know the Pandas library is one of the most preferred tools for data manipulation and analysis, and you'll have explored the fast, flexible, and expressive Pandas data structures, maybe with the help of DataCamp's Pandas Basics cheat sheet.

Yet, there is still much functionality that is built into this package to explore, especially when you get hands-on with the data: you'll need to reshape or rearrange your data, iterate over DataFrames, visualize your data, and much more. And this might be even more difficult than "just" mastering the basics.

That's why today's post introduces a new, more advanced Pandas cheat sheet.

It's a quick guide through the functionalities that Pandas can offer you when you get into more advanced data wrangling with Python.

(Do you want to learn more? Start our Data Manipulation with pandas course for free now or try out our Pandas DataFrame tutorial! )

Have this cheat sheet at your fingertips

Download PDF

The Pandas cheat sheet will guide you through some more advanced indexing techniques, DataFrame iteration, handling missing values or duplicate data, grouping and combining data, data functionality, and data visualization.

In short, everything that you need to complete your data manipulation with Python!

Don't miss out on our other cheat sheets for data science that cover Matplotlib, SciPy, Numpy, and the Python basics.

Reshape Data

Pivot

>>> df3= df2.pivot(index='Date', #Spread rows into columns          columns='Type',          values='Value')

Stack/ Unstack

>>>stacked= df5.stack() #Pivot a level of column	labels>>> stacked.unstack() #Pivot a level of index labels

Melt

>>> pd.melt(df2, #Gather columns into rows           id_vars=[''Date''],           value_vars=[''Type'', ''Value''],           value name=''Observations'')

Iteration

>>> df.iteritems() #{Column-index, Series) pairs>>> df.iterrows() #{Row-index, Series) pairs

Missing Data

>>> df.dropna() #Drop NaN values>>> df3.fillna(df3.mean()) #Fill NaN values with a predetermined value>>> df2.replace("a", "f") #Replace values with others

Advanced Indexing

Selecting

>>> df3.loc[:,(df3>1).any()] #Select cols with any vols >1>>> df3.loc[:,(df3>1).all()] #Select cols with vols> 1>>> df3.loc[:,df3.isnull().any()] #Select cols with NaN>>> df3.loc[:,df3.notnull().all()] #Select cols without NaN

Indexing With isin ()

>>> df[(df.Country.isin(df2.Type))] #Find some elements>>> df3.filter(iterns="a","b"]) #Filter on values>>> df.select(lambda x: not x%5) #Select specific elements

Where

>>> s.where(s > 0) #Subset the data

Query

>>> df6.query('second > first') #Query DataFrame

Setting/Resetting Index

>>> df.set_index('Country') #Set the index>>> df4 = df.reset_index() #Reset the index>>> df = df.rename(index=str, #Rename          DataFrame columns={"Country":"cntry",          "Capital":"cptl", "Population":"ppltn"})

Reindexing

>>>  s2   = s. reindex (['a','c','d','e',' b'])

Forward Filling

>>> df.reindex(range(4),          method='ffill')

Country	Capital	Population
0 Belgium	Brussels	11190846
1 India	New Dehli	1303171035
2 Brazil	Brasilia	207847528
3 Brazil	Brasilia	207847528

Backward Filling

>>> s3 = s.reindex(range(5),          method='bfill')

0	3
1	3
2	3
3	3
4	3

Multi-Indexing

>>>arrays= [np.array([1,2,3]),          np.array([5,4,3])]>>> df5 = pd.DataFrame(np.random.rand(3, 2), index=arrays)>>>tuples= list(zip(*arrays))>>>index= pd.Multilndex.from_tuples(tuples,               names= ['first','second'])>>> df6 = pd.DataFrame(np.random.rand(3, 2), index=index)>>> df2.set_index(["Date", "Type"])

Duplicate Data

>>> s3.unique() #Return unique values>>> df2.duplicated('Type') #Check duplicates>>> df2.drop_duplicates('Type', keep='last') #Drop duplicates>>> df.index.duplicated() #Check index duplicates

Grouping Data

Aggregation

>>> df2.groupby(by=['Date','Type']).mean()>>> df4.groupby(level=0).sum()>>> df4.groupby(level=0).agg({'a':lambda x:sum(x)/len (x), 'b': np.sum})

Transformation

>>> customSum = lambda x: (x+x%2)>>> df4.groupby(level=0).transform(customSum)

Combining Data

Merge

>>> pd.merge(data1,          data2,           how=' left',           on='X1')

>>> pd.merge(data1,          data2,           how='right',           on='X1')

>>> pd.merge(data1,          data2,           how='inner',           on='X1')

>>> pd.merge(data1,          data2,           how='outer',           on='X1')

Join

>>> data1.join(data2, how='right')

Concatenate

Vertical

>>> s.append(s2)

Horizontal/Vertical

>>> pd.concat([s,s2],axis=1, keys=['One','Two'])>>> pd.concat([datal, data2], axis=1, join='inner')

Dates

>>> df2['Date']= pd.to_datetime(df2['Date'])>>> df2['Date']= pd.date_range('2000-1-1',          periods=6,           freq='M')>>>dates= [datetime(2012,5,1), datetime(2012,5,2)]>>>index= pd.Datetimelndex(dates)>>>index= pd.date_range(datetime(2012,2,1), end, freq='BM')

Visualization

>>> import matplotlib.pyplot as plt>>> s.plot()>>> plt.show()

>>> df2.plot()>>> plt.show()

Tópicos

Python

Data Science

Relacionado

cheat-sheet

Pandas Cheat Sheet for Data Science in Python

A quick guide to the basics of the Python data analysis library Pandas, including code samples.

Karlijn Willems

cheat-sheet

Python for Data Science - A Cheat Sheet for Beginners

This handy one-page reference presents the Python basics that you need to do data science

Karlijn Willems

cheat-sheet

NumPy Cheat Sheet: Data Analysis in Python

This Python cheat sheet is a quick reference for NumPy beginners.

Karlijn Willems

cheat-sheet

Python For Data Science Cheat Sheet For Beginners

This cheat sheet covers the basics that you need to know to do data science with Python

Karlijn Willems

Tutorial

Python For Data Science - A Cheat Sheet For Beginners

This handy one-page reference presents the Python basics that you need to do data science

Karlijn Willems

Tutorial

Pandas Tutorial: DataFrames in Python

Explore data analysis with Python. Pandas DataFrames make manipulating your data easy, from selecting or replacing columns and indices to reshaping your data.

Karlijn Willems

Ver mais Ver mais