Course Notes: Reshaping Data with pandas

When to use Long Data vs Wide Data:

Wide Data -> Interested in the Objects (ex.Individual Players)

Long Data -> Interested in the observations (ex. nationality, age, etc.)

Reshaping a dataframe:

Wide to Long Transformation:

use pandas: ".wide_to_long()" OR ".melt()"

Long to Wide Transformation:

use pandas: ".pivot()" OR ".pivot_table()"

Using the PIVOT Method

df.pivot(index="name of column we want to turn INTO rows", columns="name of column we want to turn INTO columns", values="name of column we want to turn INTO values")

Using the PIVOT_TABLE Method

The .pivot_table() method is similar to the pivot method, but it aggregates the values and is more of a summary table

df.pivot_table(index="name of column(s) we want to turn INTO rows", columns="name of column we want to turn INTO columns", values="name of column we want to turn INTO values", aggfunc="name of the aggregate you want to perform", OPTIONAL: margins:True/False (when set to true, ALL columns and rows will be totalled))

pivot_table() method can also work with multi-level indexes if you input a [LIST] into the 'index' variable.

Using MELT Method

df.melt(id_vars=["names of columns to use as IDENTIFIER variables"], value_vars=["names of the columns you want to melt"], var_name="name to use for the variable column", value_name="name to use for the value column")

Using WIDE_TO_LONG() FUNCTION (PANDAS)

pd.wide_to_long(df, stubnames=["names of desired value columns"], i=["names of columns to be used as IDENTIFIERS"], j="name of column we want to use as variable", OPTIONAL: sep:"delimeter after stubnames if applicable", suffix="if suffix is not numeric")

Reshaping String Columns

use .str to call string methods. for example .str.split("delimeter")

If we want to just call the first portion of the split method, we can use: str.split("Delimeter").str.get(0)

If we want to create new columns from one split, we can do something like this:

books[['main_title', 'subtitle']]= books['title'].str.split(":", expand=True)

books.drop('title', axis=0, inplace=True)

To concatinate two columns, use the .str.cat Method

books['author_firstname'].str.cat(books['author_lastname'], sep=" ")

CREATING A MULTI-LEVEL-INDEX

Use .set_index method([columns], inplace=True/False)

We can also use .from_arrays(array_name, names=[index_names])

df.stack() method to rearrange the innermost column index to be an innermost row index

# Write and run code here