Pular para o conteúdo principal
InicioTutoriaisPython

You are never stuck with just the data you are given. Instead, you can add new columns to a DataFrame.
set. de 2020  · 4 min leer

You are never stuck with just the data you are given. Instead, you can add new columns to a DataFrame. This has many names, such as transforming, mutating, and feature engineering.

You can create new columns from scratch, but it is also common to derive them from other columns, for example, by adding columns together or by changing their units.

## Deriving a Column

Using a dog dataset, let's say you want to add a new column to your DataFrame that has each dog's height in meters instead of centimeters.

On the left-hand side of the equals, you use square brackets with the name of the new column you want to create, in this case, `height_m`. On the right-hand side, you have the calculation.

``````dogs["height_m"] = dogs["height_cm"] / 100
print(dogs)
``````
``````        name        breed   color   height_cm   weight_kg   date_of_birth   height_m
0      Bella     Labrador   Brown          56          24      2013-07-01       0.56
1    Charlie       Poodle   Black          43          24      2016-09-16       0.43
2       Lucy    Chow Chow   Brown          46          24      2014-08-25       0.46
3     Cooper    Schnauzer    Gray          49          17      2016-09-16       0.49
4        Max     Labrador   Black          59          29      2016-09-16       0.59
5     Stella    Chihuahua     Tan          18           2      2016-09-16       0.18
6     Bernie  St. Bernard   White          77          74      2016-09-16       0.77
``````

Notice that both the existing and the derived column are in the dataframe you modified.

In this example, you will calculate doggy mass index and add it as a column to your dataframe. BMI stands for body mass index, which is calculated by weight in kilograms divided by their height in meters, squared.

``````dogs["height_m"] = dogs["weight_kg"] / dogs["height_m"] ** 2
``````
``````        name        breed   color   height_cm   weight_kg   date_of_birth   height_m          bmi
0      Bella     Labrador   Brown          56          24      2013-07-01       0.56    76.530612
1    Charlie       Poodle   Black          43          24      2016-09-16       0.43   129.799892
2       Lucy    Chow Chow   Brown          46          24      2014-08-25       0.46   113.421550
3     Cooper    Schnauzer    Gray          49          17      2016-09-16       0.49    70.803832
4        Max     Labrador   Black          59          29      2016-09-16       0.59    83.309394
``````

Again, the new column is on the left-hand side of the equals, but this time, our calculation involves two columns.

## Adding a Column with Multiple Manipulations

The real power of pandas comes in when you combine all the skills that you have learned so far. Let's figure out the names of skinny, tall dogs.

First, to define the skinny dogs, you take the subset of dogs that have a BMI of less than 100. Next, you sort the height in descending order of height to get the tallest skinny dogs at the top.

Finally, this time you will only keep the columns you are interested in.

``````bmi_lt_100 = dogs[dogs["bmi"] < 100]
bmi_lt_100_height = bmi_lt_100.short_values("height_cm", ascending=False)
bmi_lt_100_height[["name", "height_cm", "bmi"]]
``````
``````        name      height_cm           bmi
4        Max             59     83.309394
0      Bella             56     76.530612
3     Cooper             49     70.803832
5     Stella             18     61.728395
``````

Here, you can see that `Max` is the tallest dog with a BMI of under 100.

## Interactive Example

In the below example, you add a new column to DataFrame `homelessness`, named `total`, containing the sum of the `individuals` and `family_members` columns. Then, add another column to `homelessness`, named `p_individuals`, containing the proportion of homeless people in each state who are individuals. Finally, print the `homelessness` dataframe.

``````# Add total col as sum of individuals and family_members
homelessness["total"] = homelessness["individuals"] + homelessness["family_members"]

# Add p_individuals col as proportion of individuals
homelessness["p_individuals"] = homelessness["individuals"] / homelessness["total"]

# See the result
print(homelessness)
``````

When we run the above code, it produces the following result:

``````                region                 state  individuals  family_members  state_pop     total  p_individuals
0   East South Central               Alabama       2570.0           864.0    4887681    3434.0          0.748
1              Pacific                Alaska       1434.0           582.0     735139    2016.0          0.711
2             Mountain               Arizona       7259.0          2606.0    7158024    9865.0          0.736
3   West South Central              Arkansas       2280.0           432.0    3009733    2712.0          0.841
4              Pacific            California     109008.0         20964.0   39461588  129972.0          0.8
...
48      South Atlantic         West Virginia       1021.0           222.0    1804291    1243.0          0.821
49  East North Central             Wisconsin       2740.0          2167.0    5807406    4907.0          0.558
50            Mountain               Wyoming        434.0           205.0     577601     639.0          0.679
``````

Also take a look at DataCamp's tutorial on How To Drop Columns in Pandas.

This content is taken from DataCamp’s Data Manipulation with pandas course by Maggie Matsui and Richie Cotton.

Temas

Course

### .css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Data Manipulation with pandas

4 hr
369K
Learn how to import and clean data, calculate statistics, and create visualizations with pandas.
See Details
Start Course

Course

### Joining Data with pandas

4 hr
145.6K
Learn to combine data from multiple tables by joining data together using pandas.

Course

### Reshaping Data with pandas

4 hr
15.5K
Reshape DataFrames from a wide to long format, stack and unstack rows and columns, and wrangle multi-index DataFrames.
Veja Mais

tutorial

### How to Drop Columns in Pandas Tutorial

Learn how to drop columns in a pandas DataFrame.

DataCamp Team

3 min

tutorial

### Pandas Tutorial: DataFrames in Python

Explore data analysis with Python. Pandas DataFrames make manipulating your data easy, from selecting or replacing columns and indices to reshaping your data.

Karlijn Willems

20 min

tutorial

### Python Select Columns Tutorial

Use Python Pandas and select columns from DataFrames. Follow our tutorial with code examples and learn different ways to select your data today!

DataCamp Team

7 min

tutorial

### Joining DataFrames in pandas Tutorial

In this tutorial, you’ll learn various ways in which multiple DataFrames could be merged in python using Pandas library.

DataCamp Team

19 min

tutorial

### Pandas Sort Values Tutorial

Learn how to sort rows of data in a pandas Dataframe using the .sort_values() function.

DataCamp Team

4 min

tutorial

### Python pandas tutorial: The ultimate guide for beginners

Are you ready to begin your pandas journey? Here’s a step-by-step guide on how to get started.

Vidhi Chugh

15 min

See MoreSee More