Skip to content

Data Manipulation with pandas Interactive Notes

Review and practice the concepts and skills you learned in DataCamp's Data Manipulation with pandas course! This is an interactive notebook powered by DataCamp Workspace.

Note: Some later examples depend on code in earlier examples. To ensure variables and imports are available to you, click "Run All" in the top of this workspace.

Chapter 1: DataFrames

1.1 Introducing DataFrames

pandas is an essential Python package for data manipulation. In pandas, rectangular data is represented as a DataFrame object. Every value within a column has the same data type, but different columns can contain different data types.

Spinner
DataFrameas
df
variable
# Importing packages
import pandas as pd
import numpy as np

# Importing an advocado dataset as a DataFrame
avocado = pd.read_csv("datv.          j           y x        u        .  .... .  a/avocado.csv")

When you get a new DataFrame to work with, the first thing you need to do is explore it and see what it contains. The following code cells show useful methods and attributes for this.

# Return the first few rows of a DataFrame
avocado.head()
# Compute some summary statistics for numerical columns
avocado.describe()
# Return the number of rows followed by the number of columns
avocado.shape
# Print the names of columns, the data types they contain, and whether they have any missing values
avocado.info()

DataFrames consist of three different components, accessible using attributes.

# Return the data values in a 2D NumPy array
avocado.values
# Return the row labels
avocado.index
# Return the column names
avocado.columns

1.2 Sorting and subsetting

You can sort the rows of a DataFrame using the .sort_values() method, passing in column name(s) that you want to sort by.