Skip to main content
HomeTutorialsData Analysis

How to Save a Pandas DataFrame to CSV

In Pandas, you can save a DataFrame to a CSV file using the df.to_csv('your_file_name.csv', index=False) method, where df is your DataFrame and index=False prevents an index column from being added.
Jun 26, 2024  · 7 min read

If you create a DataFrame or modify an existing one, you may want to export it so that other people can work with it, or you can save it for your future work. For example, you may have to submit a prediction from a machine learning project to your examiner or send a modified dataset to a coworker.

In Python, you can export a DataFrame as a CSV file using Pandas’ .to_csv() method.

In this article, I’ll walk you through the main steps of the process and explain the method's parameters.

If you want to learn more about Pandas, check out this course on Data Manipulation with Pandas.

I’ll start with a quick answer for those of you in a hurry, and then I’ll get into the details.

Short Answer: How to Save Pandas DataFrame to CSV

To save a Pandas DataFrame as a CSV, use the DataFrame.to_csv() method: 

df.to_csv('your_file_name.csv', index=False)

Replace 'your_file_name.csv' with the desired name and path for your file. The index=False argument prevents Pandas from adding an index column to your CSV.

Creating a Pandas DataFrame

Now, let’s look at the longer answer. Instead of creating a DataFrame from scratch, I’ll import a dataset, modify it, and save the resulting DataFrame in CSV format. The first step is to import the necessary libraries:

import pandas as pd
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler

Then, we can import a seaborn dataset. The imported dataset is already a Pandas Dataframe, so we don’t need to transform it.

# We’ll import the 'titanic' dataset from seaborn
df = sns.load_dataset('titanic')
print(df.head())
survived  pclass     sex   age  sibsp  parch     fare embarked  class  
0         0       3    male  22.0      1      0   7.2500        S  Third   
1         1       1  female  38.0      1      0  71.2833        C  First   
2         1       3  female  26.0      0      0   7.9250        S  Third   
3         1       1  female  35.0      1      0  53.1000        S  First   
4         0       3    male  35.0      0      0   8.0500        S  Third 
    who  adult_male deck  embark_town alive  alone  
0    man        True  NaN  Southampton    no  False  
1  woman       False    C    Cherbourg   yes  False  
2  woman       False  NaN  Southampton   yes   True  
3  woman       False    C  Southampton   yes  False  
4    man        True  NaN  Southampton    no   True  

Modifying a Pandas DataFrame

Now, let’s do some processing to get a new DataFrame. Since my purpose is to explore the .to_csv() method from Pandas, I’ll only do a min-max normalization on numerical variables.

scaler = MinMaxScaler()
# Choose the columns that have integer or float data-types                                           
numerical columns - df.select_dtypes(include=['float64','int64']).columns
new_df = df.copy()
# The min-max scaler will represent all numbers on a 0-to-1 scale
new_df[numerical_columns] = scaler.fit_transform(df[numerical_columns])
print(new_df.head())
survived  pclass     sex       age  sibsp  parch      fare embarked    
0       0.0     1.0    male  0.271174  0.125    0.0  0.014151        S  
1       1.0     0.0  female  0.472229  0.125    0.0  0.139136        C  
2       1.0     1.0  female  0.321438  0.000    0.0  0.015469        S  
3       1.0     0.0  female  0.434531  0.125    0.0  0.103644        S  
4       0.0     1.0    male  0.434531  0.000    0.0  0.015713        S  
class   who  adult_male deck  embark_town alive  alone  
Third   man        True  NaN  Southampton    no  False  
First   woman      False   C    Cherbourg   yes  False  
Third   woman      False NaN  Southampton   yes   True  
First   woman      False   C  Southampton   yes  False  
Third   man        True  NaN  Southampton    no   True  

Saving a Pandas DataFrame as CSV

We’ll use the DataFrame.to_csv() method to save the modified DataFrame as a CSV file.

new_df.to_csv('new_titanic.csv') 

This will save the CSV file in the current directory. If you need a refresher, read this tutorial on how to get the current directory in Python.

The way I called this method ensured all parameters were set to their default values. Now, I’ll present the parameters and how you can change their values according to your needs.

Parameters of DataFrame.to_csv() Explained

Here’s the same .to_csv() method listed with all the parameters:

new_df.to_csv(
    path_or_buf='new_titanic.csv',
    sep=',',
    na_rep='',
    float_format=None,
    columns=[],
    header=True,
    index=True,
    index_label=None,
    mode='w',
    encoding='utf-8',
    quoting=csv.QUOTE_MINIMAL,
    lineterminator=os.linesep,
    storage_options={},
    compression='infer',
    quotechar='',
    chunksize=None,
    date_format=None,
    doublequote=True,
    escapechar=None,
    decimal='.',
    errors='strict'
)

Some of these parameters are optional, such as columns, encoding, quoting, lineterminator and storage_options, so I just set values for them, not necessarily default values.

The first parameter can be either a string, which will be the name of the file, or a file object. Look at this example:

with open('new_titanic.csv',  'w', encoding = 'utf=8') as file:
     new_df.to_csv(file)

Separators and missing values

The next parameter is the separator, which refers to how the data are separated. The default separator is the comma, but we can change it to a tab or semicolon. A tab or semicolon separator can be useful when your exported data needs to be in a certain format. Also, for data that contain commas, like texts, using the comma as a separator can bring unwanted changes.

For example, if you have ‘Hello, world’ as a datapoint, then if you want to use the comma separator, you also need quotations to keep it as a single datapoint. Using a tab or semicolon separator instead can be the better option since it may take less memory than using commas and quotation marks.

new_df.to_csv('new_titanic.csv', sep = '\t')
new_df_tab = pd.read_csv('new_titanic.csv')
0\t0\t3\tmale\t22.0\t1\t0\t7.25\tS\tThird\tman...                                                                 
1\t1\t1\tfemale\t38.0\t1\t0\t71.2833\tC\tFirst...                                                                 
2\t1\t3\tfemale\t26.0\t0\t0\t7.925\tS\tThird\t...                                                                 
3\t1\t1\tfemale\t35.0\t1\t0\t53.1\tS\tFirst\tw...                                                                 
4\t0\t3\tmale\t35.0\t0\t0\t8.05\tS\tThird\tman... 

The third parameter, na_rep, refers to the representation of the missing values. Common choices would be to leave the cell empty to insert ‘NaN’ or ‘Na’, but you can also insert your custom string. I will insert ‘Datacamp’ in the following example for every missing value.

new_df.to_csv('new_titanic.csv', na_rep='Datacamp')
new_df_datacamp = pd.read_csv('new_titanic.csv') 
new_df_datacamp = pd.DataFrame(new_df_datacamp) 
#find the missing values
missing_values = new_df_datacamp.isnull().sum()
print(missing_values)
survived         0
pclass           0
sex              0
age            177
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      2
alive            0
alone            0

Handling columns and indexes

Let’s see how the missing values are represented for the age column:

print(new_df_datacamp['age'][:10])
     0.2711736617240512
     0.4722292033174164
    0.32143754712239253
     0.4345312892686604
     0.4345312892686604
               Datacamp
     0.6732847449107816
    0.01985423473234481
    0.33400351847197784
    0.17064589092736868

If you want fewer decimals, use the parameter float_format to specify the number of decimals. Here, we’ll use only three.

new_df.to_csv('new_titanic.csv', na_rep='Datacamp', float_format='%.3f')
new_df_datacamp = pd.read_csv('new_titanic.csv')
new_df_datacamp = pd.DataFrame(new_df_datacamp)
print(new_df_datacamp['age'][:10])
0.271
0.472
0.321
0.435
0.435
Datacamp
0.673
0.020
0.334
0.171

If you don’t want to see the index, set the index parameter to False.

You can also decide whether to keep the column names, using the header parameter and setting it to True or False, which is the way to go (or you can set the names for the columns).

The columns parameter allows you to select which columns to keep. The index_label parameter allows you to set a name for the column index or a list of names if you have multiple indexes.

new_df.to_csv(
    'new_titanic.csv', 
    index=True, 
    header=True, 
    columns=['age', 'alive'], 
    index_label='Index'
)
new_df = pd.read_csv('new_titanic.csv')
new_df = pd.DataFrame(new_df)
print(new_df.head())
  Index       age alive
      0  0.271174    no
      1  0.472229   yes
      2  0.321438   yes
      3  0.434531   yes
      4  0.434531    no

File opening, encoding, and compression

The mode parameter allows you to decide what to do when you already have a CSV file with the same name as the one you have built. If you set the parameter to ‘w’ and already have a new_titanic.csv file, the one you create will replace the old one. If you set it to ‘x’, the new file is created only if there is no existing file with the same. Finally, if you set the parameter to ‘a’, it’ll append the new file to the older one with the same name (maybe you just want to add some rows).

The encoding parameter sets the encoding of the new file. If your data contains special characters, such as letters of other alphabets, you can set it to ‘utf-8’ or another encoding.

The compression parameter sets the way the new file is compressed. By default, it’s set to ‘infer’, which means that it compresses the file according to the specified extension, but you can also specify your preferred extension.

# Infer the compression
new_df.to_csv('new_titanic.csv.gz')
# Specify the compression explicitly
new_df.to_csv('new_data.csv.gz', compression='gzip')
# Specify the compression via a dictionary
new_df.to_csv(
    'new_data.zip', 
    compression={
        'method': 'zip', 
        'archive_name': 'data.csv'
    }
)

How to use quoting

The quoting parameter sets the kind of data to be put between quotes. As I said earlier, when you have a comma separator and also cells containing commas (such as texts), you have to decide what to put between quotes.

# Default value, quotes only fields with special characters or separators like tabs and commas
new_df.to_csv('new_titanic.csv', quoting=csv.QUOTE_MINIMAL)
# Quotes all fields
new_df.to_csv('new_titanic.csv', quoting=csv.QUOTE_ALL)
# Quotes only fields with non-numeric values
new_df.to_csv('new_titanic.csv', quoting=csv.QUOTE_NONNUMERIC)
# No field is quoted
new_df.to_csv('new_titanic.csv', quoting=csv.QUOTE_NONE)

Chunksize and date-time data

The chunksize parameter specifies how much of your file will be written at once. By letting the default value, all the rows are written at once, but you can specify a number, which can be useful when the file is very large.

The date_format parameter sets the representation of datetime objects. You can specify it using the strftime format.

# Write DataFrame to CSV with dates formatted as 'MM/DD/YYYY'
new_df.to_csv('new_titanic.csv', date_format='%m/%d/%Y')
# You can set datetime objects up to seconds
new_df.to_csv('new_titanic.csv', date_format='%Y-%m-%d %H:%M:%S')

Conclusion

Hopefully, you will now be ready to export your DataFrame in the format that suits your needs!

I encourage you to keep learning—here are some resources you can try:


Photo of Tudor Mărginean
Author
Tudor Mărginean

I am finishing my master's degree in data science at the University of Bucharest, Romania, with a thesis about traffic flow prediction using various models based on Graph Neural Networks. The main skills I learned were data cleaning and visualization in Python and Power BI, preprocessing for various machine learning tasks, and building models with PyTorch (or TensorFlow). At the same time, as a PhD student in philosophy, I am teaching seminars on first-order logic while working on my thesis on Bayesian confirmation theory. I'm eager to learn more about data and building models and to teach what I learn.

Topics

Learn Pandas with these courses!

Course

Data Manipulation with pandas

4 hr
383.7K
Learn how to import and clean data, calculate statistics, and create visualizations with pandas.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

cheat-sheet

Reshaping Data with pandas in Python

Pandas DataFrames are commonly used in Python for data analysis, with observations containing values or variables related to a single object and variables representing attributes across all observations.
Richie Cotton's photo

Richie Cotton

tutorial

Pandas Tutorial: DataFrames in Python

Explore data analysis with Python. Pandas DataFrames make manipulating your data easy, from selecting or replacing columns and indices to reshaping your data.
Karlijn Willems's photo

Karlijn Willems

20 min

tutorial

pandas read csv() Tutorial: Importing Data

Importing data is the first step in any data science project. Learn why today's data scientists prefer the pandas read_csv() function to do this.
Kurtis Pykes 's photo

Kurtis Pykes

9 min

tutorial

How to Drop Columns in Pandas Tutorial

Learn how to drop columns in a pandas DataFrame.
DataCamp Team's photo

DataCamp Team

3 min

tutorial

Pandas Add Column Tutorial

You are never stuck with just the data you are given. Instead, you can add new columns to a DataFrame.
DataCamp Team's photo

DataCamp Team

4 min

tutorial

Pandas Reset Index Tutorial

Learn the pandas reset_index() method to reset the index of a DataFrame. Explore the different options available with this method and how to reset the index for simple and multi-level DataFrame.
Satyam Tripathi's photo

Satyam Tripathi

8 min

See MoreSee More