Skip to content

Summary Notes: Data Manipulation with pandas

1. Inspecting a DataFrame:

  • .head(): Returns the first few rows of the DataFrame.
  • .info(): Provides information on columns, data types, and missing values.
  • .shape: Returns the number of rows and columns.
  • .describe(): Calculates summary statistics for each column.
  • Example: homelessness.head(), homelessness.info(), homelessness.shape, homelessness.describe()

2. Parts of a DataFrame:

  • .values: A two-dimensional NumPy array of values.
  • .columns: An index of column names.
  • .index: An index for rows (row numbers or names).
  • Example: homelessness.values, homelessness.columns, homelessness.index

3. Sorting Rows:

  • Sorting by one column: df.sort_values("column_name")
  • Sorting by multiple columns: df.sort_values(["col_name1", "col_name2"])
  • Example: homelessness.sort_values("num_homeless"), homelessness.sort_values(["region", "num_family_members"], ascending=[True, False])

4. Subsetting Columns:

  • Selecting a single column: df["column_name"]
  • Selecting multiple columns: df[["col_name1", "col_name2"]]
  • Example: individuals = homelessness["individuals"], state_fam = homelessness[["state", "family_members"]], ind_state = homelessness[["individuals", "state"]]
# Start coding here...