DataFrames can be easy to create, but typing out your data entry-by-entry is not the most efficient way to get your data into a DataFrame.
An easy way to create a DataFrame is by using a CSV. But what is a CSV file? CSV or comma-separated values is a common data storage file type. It is designed to store tabular data, just like a pandas DataFrame. It's a text file where each row of data has its own line, and a comma separates each value.
CSV to a DataFrame
Below is an example of a csv file:
new_dogs.csv
name, breed, height_cm, weight_kg, d_o_b
Ginger, Dachshund, 22, 10, 2019-03-14
Scout, Dalmatian, 59, 25, 2019-05-09
We can put the above data in a DataFrame using the pandas
library using the read_csv
function and pass it the file path of the CSV.
import pandas as pd
new_dogs = pd.read_csv("new_dogs.csv")
print(new_dogs)
name breed height_cm weight_kg date_of_birth
0 Ginger Dachshund 22 10 2019-03-14
1 Scout Dalmatian 59 25 2019-05-09
DataFrame to a CSV
Now that the data is in DataFrame form, we can manipulate it by adding a body mass index column.
new_dogs["bmi"] = new_dogs["weight_kg"] / (new_dogs["height_cm"] / 100) ** 2
print(new_dogs)
name breed height_cm weight_kg date_of_birth bmi
0 Ginger Dachshund 22 10 2019-03-14 206.611570
1 Scout Dalmatian 59 25 2019-05-09 71.818443
Now that the data has been changed, let's create an updated CSV file. To convert the DataFrame back to a CSV, you can use the following function.
new_dogs.to_csv("new_dogs_with_bmi.csv")
When we run the above code, it produces the following result:
new_dogs_with_bmi.csv
name, breed, height_cm, weight_kg, d_o_b, bmi
Ginger, Dachshund, 22, 10, 2019-03-14, 206.611570
Scout, Dalmatian, 59, 25, 2019-05-09, 71.818443
Interactive Example of a DataFrame to a CSV
In the below example, you Sort the airline_totals
by the values of column bumps_per_10k
from highest to lowest (descending order), storing as airline_totals_sorted
. You will then Print your sorted DataFrame. Finally, save the sorted DataFrame as a CSV called "airline_totals_sorted.csv"
.
# Create airline_totals_sorted
airline_totals_sorted = airline_totals.sort_values("bumps_per_10K", ascending=FALSE)
# Print airline_totals_sorted
print(airline_totals_sorted)
# Save as airline_totals_sorted.csv
airline_totals_sorted.to_csv("airline_totals_sorted.csv")
When we run the above code, it produces the following result:
nb_bumped total_passengers bumps_per_10k
airline
EXPRESSJET AIRLINES 3326 27858678 1.194
SPIRIT AIRLINES 2920 32304571 0.904
SOUTHWEST AIRLINES 18585 228142036 0.815
JETBLUE AIRWAYS 3615 53245866 0.679
SKYWEST AIRLINES 3094 47091737 0.657
AMERICAN AIRLINES 11115 197365225 0.563
FRONTIER AIRLINES 1228 22954995 0.535
ALASKA AIRLINES 1392 36543121 0.381
UNITED AIRLINES 4941 134468897 0.367
VIRGIN AMERICA 242 12017967 0.201
DELTA AIR LINES 1591 197033215 0.081
HAWAIIAN AIRLINES 122 16577572 0.074
To learn more about reading and writing CSVs into pandas
, please see this video from our course Data Manipulation with pandas.
This content is taken from DataCamp’s Data Manipulation with pandas course by Maggie Matsui.
← Back to tutorial