Explore a DataFrame
Use this template to get a solid understanding of the structure of your DataFrame and its values before jumping into a deeper analysis. This template leverages many of pandas' handy functions for the most fundamental exploratory data analysis steps, including inspecting column data types and distributions, creating exploratory visualizations, and counting unique and missing values.
import pandas as pd
import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt
# Load your dataset into a DataFrame
df = pd.read_csv("data/taxis.csv")
# Print the number of rows and columns
print("Number of rows and columns:", df.shape)
# Print out the first five rows
df.head()The info() function prints a concise summary of the DataFrame. For each column, you can find its name, data type, and the number of non-null rows. This is useful to gauge if there are many missing values and to understand what data types you're dealing with.
Understanding columns and values
df.info()df.isna().sum()If there are missing values, you'll have to decide if and how missing values should be dealt with. If you want to learn more about removing and replacing values, check out chapter 2 of DataCamp's Data Manipulation with pandas course.
The describe() function generates helpful descriptive statistics for each numeric column. You can see the percentile, mean, standard deviation, and minimum and maximum values in its output. Note that missing values are excluded here.
df.describe()Use the unique() function to print out the unique values of a column:
df["pickup_borough"].unique() # Replace with a column of interestUse the value_counts() function to print out the number of rows for each unique value:
df["pickup_borough"].value_counts( # Replace with a column of interest
dropna=True # Set to False if you want to include NaN values
)Basic data visualizations