Data Manipulation with pandas

Pandas

.head() returns the first few(5) rows (the “head” of the DataFrame).
.info() shows information on each of the columns, such as the data type and number of missing values (gives the name of the columns)
.shape returns the number of rows and columns of the DataFrame.
.describe() calculates a few summary statistics for each column (numerical data)

.values: A two-dimensional NumPy array of values.(turns dataframe display into a numpy array)
.columns: An index of columns: the column names.
.index: An index for the rows: either row numbers or row names.

.sort_values(): sort rows by passing a column name, from the less to the most, you can use the arg ascending(takes True or False)
To select only "col_a" of the DataFrame df: df["col_a"]
To select "col_a" and "col_b" of df, use: df[["col_a", "col_b"]] **filering
dogs[dogs["height_cm"] > 60]
dogs[dogs["color"] == "tan"]
dogs[(dogs["height_cm"] > 60) & (dogs["color"] == "tan")] **filtering using catergorical variables (isin(): function)
colors = ["brown", "black", "tan"]
condition = dogs["color"].isin(colors)
dogs[condition]

-.set_index("column name"): you can pass a list to multi-indexing

.sort_values(): sorting by the values of the column, can be in ascending or descending order, takes the name of the column as an arg
.sort_index(): sort by the values of the index column, can take a column as an arg in which the sorting will only happen there -Slice rows with code like df.loc[("a", "b"):("c", "d")]. -Slice columns with code like df.loc[:, "e":"f"]. -Slice both ways with code like df.loc[("a", "b"):("c", "d"), "e":"f"].

.agg() method allows you to apply your own custom functions to a DataFrame (define your own function then use it as an arg to the method)
.drop_duplicates()
.value_counts(): enumerazation of items, used for categorical data, has arg normalize (for normalization of the data hihi)

.groupby(categorical variable): instead of new columns definition
Pivot tables are the standard way of aggregating data in spreadsheets.
the .pivot_table() method is an alternative to .groupby()
dogs.pivot_table(values="weight_kg",index="color") gives you the mean weight by each color
dogs.pivot_table(values="weight_kg", index="color", aggfunc=np.median) as previous but plus a median
dogs.pivot_table(values="weight_kg", index="color", columns="breed", fill_value=0)
fill_value replaces missing values with a real value
margins is a shortcut for when you pivoted by two variables, but also wanted to pivot by each of those variables separately: it gives the row and column totals of the pivot table contents.
.isna(): checks for missing values
.isna().any() : checks within the columns
isna().sum(): nb of missing values
.dropna() remove rows of missing values
fillna(0) replace missing values with 0
.value_counts() enumates categorical variables