Tutorials
python
+1

Pandas Sort Values

Finding interesting bits of data in a DataFrame is often easier if you change the rows' order. You can sort the rows by passing a column name to .sort_values().

In cases where rows have the same value (this is common if you sort on a categorical variable), you may wish to break the ties by sorting on another column. You can sort on multiple columns in this way by passing a list of column names.


Modifying the Order of Columns

You can change the rows' order by sorting them so that the most interesting data is at the top of the dataframe.

For example, when we apply sort_values() on the weight_kg column of the dogs dataframe, we get the lightest dog at the top, Stella the Chihuahua, and the heaviest dog at the bottom, Bernie the Saint Bernard.

dogs.sort_values("weight_kg")
      name        breed  color  height_cm  weight_kg date_of_birth
5   Stella    Chihuahua    Tan         18          2    2015-04-20
3   Cooper    Schnauzer   Gray         49         17    2011-12-11
0    Bella     Labrador  Brown         56         24    2013-07-01
1  Charlie       Poodle  Black         43         24    2016-09-16
2     Lucy    Chow Chow  Brown         46         24    2014-08-25
4      Max     Labrador  Black         59         29    2017-01-20
6   Bernie  St. Bernard  White         77         74    2018-02-27

Setting the ascending argument to False will sort the data the other way round, from heaviest to lightest dog.

dogs.sort_values("weight_kg", ascending=False)
      name        breed  color  height_cm  weight_kg date_of_birth
6   Bernie  St. Bernard  White         77         74    2018-02-27
4      Max     Labrador  Black         59         29    2017-01-20
0    Bella     Labrador  Brown         56         24    2013-07-01
1  Charlie       Poodle  Black         43         24    2016-09-16
2     Lucy    Chow Chow  Brown         46         24    2014-08-25
3   Cooper    Schnauzer   Gray         49         17    2011-12-11
5   Stella    Chihuahua    Tan         18          2    2015-04-20

Sorting by Multiple Variables

We can sort by multiple variables by passing a list of column names to sort_values. Here, we sort first by weight, then by height. Now, Charlie, Lucy, and Bella are ordered from shortest to tallest, even though they all weigh the same.

dogs.sort_values(["weight_kg", "height_cm"])
      name        breed  color  height_cm  weight_kg date_of_birth
5   Stella    Chihuahua    Tan         18          2    2015-04-20
3   Cooper    Schnauzer   Gray         49         17    2011-12-11
1  Charlie       Poodle  Black         43         24    2016-09-16
2     Lucy    Chow Chow  Brown         46         24    2014-08-25
0    Bella     Labrador  Brown         56         24    2013-07-01
4      Max     Labrador  Black         59         29    2017-01-20
6   Bernie  St. Bernard  White         77         74    2018-02-27

To change the direction values are sorted in, pass a list to the ascending argument to specify which direction sorting should be done for each variable. Now, Charlie, Lucy, and Bella are ordered from tallest to shortest.

dogs.sort_values(["weight_kg", "height_cm"], ascending=[True, False])
      name        breed  color  height_cm  weight_kg date_of_birth
5   Stella    Chihuahua    Tan         18          2    2015-04-20
3   Cooper    Schnauzer   Gray         49         17    2011-12-11
0    Bella     Labrador  Brown         56         24    2013-07-01
2     Lucy    Chow Chow  Brown         46         24    2014-08-25
1  Charlie       Poodle  Black         43         24    2016-09-16
4      Max     Labrador  Black         59         29    2017-01-20
6   Bernie  St. Bernard  White         77         74    2018-02-27

Interactive Example

In the following example, you will sort homelessness by the number of homeless individuals, from smallest to largest, and save this as homelessness_ind. Finally, you will print the head of the sorted DataFrame.

# Sort homelessness by individuals
homelessness_ind = homelessness.sort_values("individuals")

# Print the top few rows
print(homelessness_ind.head())

When we run the above code, it produces the following result:

                region         state  individuals  family_members  state_pop
50            Mountain       Wyoming        434.0           205.0     577601
34  West North Central  North Dakota        467.0            75.0     758080
7       South Atlantic      Delaware        708.0           374.0     965479
39         New England  Rhode Island        747.0           354.0    1058287
45         New England       Vermont        780.0           511.0     624358

Try it for yourself.

To learn more about sorting and subsetting the data, please see this video from our course Data Manipulation with pandas.

This content is taken from DataCamp’s Data Manipulation with pandas course by Maggie Matsui and Richie Cotton.