Tutorials
python
+1

pandas to CSV

DataFrames can be easy to create, but typing out your data entry-by-entry is not the most efficient way to get your data into a DataFrame.

An easy way to create a DataFrame is by using a CSV. But what is a CSV file? CSV or comma-separated values is a common data storage file type. It is designed to store tabular data, just like a pandas DataFrame. It's a text file where each row of data has its own line, and a comma separates each value.

CSV to a DataFrame

Below is an example of a csv file:

new_dogs.csv
name, breed, height_cm, weight_kg, d_o_b
Ginger, Dachshund, 22, 10, 2019-03-14
Scout, Dalmatian, 59, 25, 2019-05-09

We can put the above data in a DataFrame using the pandas library using the read_csv function and pass it the file path of the CSV.

import pandas as pd
new_dogs = pd.read_csv("new_dogs.csv")
print(new_dogs)
      name       breed   height_cm  weight_kg   date_of_birth
0    Ginger  Dachshund          22         10      2019-03-14
1    Scout   Dalmatian          59         25      2019-05-09

DataFrame to a CSV

Now that the data is in DataFrame form, we can manipulate it by adding a body mass index column.

new_dogs["bmi"] = new_dogs["weight_kg"] / (new_dogs["height_cm"] / 100) ** 2

print(new_dogs)
      name       breed   height_cm  weight_kg   date_of_birth            bmi
0    Ginger  Dachshund          22         10      2019-03-14     206.611570
1    Scout   Dalmatian          59         25      2019-05-09      71.818443

Now that the data has been changed, let's create an updated CSV file. To convert the DataFrame back to a CSV, you can use the following function.

new_dogs.to_csv("new_dogs_with_bmi.csv")

When we run the above code, it produces the following result:

new_dogs_with_bmi.csv
name, breed, height_cm, weight_kg, d_o_b, bmi
Ginger, Dachshund, 22, 10, 2019-03-14, 206.611570
Scout, Dalmatian, 59, 25, 2019-05-09, 71.818443

Interactive Example of a DataFrame to a CSV

In the below example, you Sort the airline_totals by the values of column bumps_per_10k from highest to lowest (descending order), storing as airline_totals_sorted. You will then Print your sorted DataFrame. Finally, save the sorted DataFrame as a CSV called "airline_totals_sorted.csv".

# Create airline_totals_sorted
airline_totals_sorted = airline_totals.sort_values("bumps_per_10K", ascending=FALSE)

# Print airline_totals_sorted
print(airline_totals_sorted)

# Save as airline_totals_sorted.csv
airline_totals_sorted.to_csv("airline_totals_sorted.csv")

When we run the above code, it produces the following result:

                     nb_bumped  total_passengers  bumps_per_10k
airline                                                        
EXPRESSJET AIRLINES       3326          27858678          1.194
SPIRIT AIRLINES           2920          32304571          0.904
SOUTHWEST AIRLINES       18585         228142036          0.815
JETBLUE AIRWAYS           3615          53245866          0.679
SKYWEST AIRLINES          3094          47091737          0.657
AMERICAN AIRLINES        11115         197365225          0.563
FRONTIER AIRLINES         1228          22954995          0.535
ALASKA AIRLINES           1392          36543121          0.381
UNITED AIRLINES           4941         134468897          0.367
VIRGIN AMERICA             242          12017967          0.201
DELTA AIR LINES           1591         197033215          0.081
HAWAIIAN AIRLINES          122          16577572          0.074

Try it for yourself.

To learn more about reading and writing CSVs into pandas, please see this video from our course Data Manipulation with pandas.

This content is taken from DataCamp’s Data Manipulation with pandas course by Maggie Matsui.