How to Analyze Data in Google Sheets With Python: A Step-By-Step Guide
Companies run on spreadsheets. From financial management and budgeting to project management and KPI reporting: every team relies on spreadsheets in some shape or form.
If you want to be effective as an analyst in this spreadsheet-driven environment, it’s crucial that you know how to programmatically access data in spreadsheets as well as write the results of your analyses back to spreadsheets.
By automating the process, you can meet your stakeholders where their data is stored and avoid the tedium of manually exporting and importing data into spreadsheets.
In this tutorial, we’ll cover all the necessary steps to access data that lives in a Google Sheet spreadsheet, analyze it, and write it back to a spreadsheet. We will do all of this with Python with no repetitive manual steps involved!
We’ll use DataCamp Workspace in this tutorial, as it has built-in functionality for securely storing secrets as environment variables, a feature we’ll use to store Google API credentials. To follow along, all you need is a Google account (to house the spreadsheet) and a DataCamp account (to use DataCamp Workspace), both of which you can create for free!
More specifically, the steps we will take are:
- Configure a Google Service Account
- Create a Google spreadsheet with data
- Create a new workspace
- Store your service account credentials in Workspace
- Read Google Sheets data
- Analyze the data in Python
- Write results to Google Sheets
Below, you’ll find the step-by-step instructions for how to analyze data in Google Sheets with Python.
1. Configure a Google Service Account
To programmatically access data in Google Sheets, you need to create a Google service account, which is a special kind of account that can be used by programs to access Google resources like a spreadsheet. You will use this service account to connect DataCamp Workspace to Google Sheets.
You only have to set up this Google service account once for every Google account that you want to access spreadsheets in; you can skip this step the next time you want to work with spreadsheets data in the same account.
Follow the steps below to create the service account and generate the necessary credentials.
- Make sure you’re signed in with your Google account.
- Navigate to the Google API Library
- Create a new project by clicking in the dropdown on the navbar.
- Search for the “Google Sheets API” and enable it. This can take up to 10 seconds.
- In the “APIs and services” navbar on the left, go to the “Credentials tab”
- Click on “+ CREATE CREDENTIALS” and select “Service Account”
- In the first step (service account details), provide a name for the service account, e.g., “gsheet-operator” and click on “Create and continue”
- In the second step, select the “Owner” role and click “Continue”
- In the third step, don’t change anything and click “Done”
- Once back on the Credentials page, click on the service account you just created.
- Go to the Keys tab, click “Add Key > Create new key”
- Choose “JSON,” then click “Create.” The JSON file with your service account credentials will automatically download to your computer.
You now have a service account and a JSON credentials file! Head over to your Downloads folder or wherever the JSON file was downloaded, open it up, and have a look. It should look something like this:
{
"type": "service_account",
"project_id": "<your-project-name>",
"private_key_id": "<something-private>",
"private_key": "-----BEGIN PRIVATE KEY-----\nM<some-very-private-stuff\n",
"client_email": "[email protected]",
"client_id": "123456789012345678901",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/gsheets-operator%40<project-name>.iam.gserviceaccount.com"
}
There’s a “client_email” field in there: [email protected]<google-project-name>.iam.gserviceaccount.com. Copy this email to your clipboard; you’ll need it in the next step.
2. Create a Google Spreadsheet With Data
Before you can analyze data in spreadsheets, you need to make sure you have a spreadsheet with data in it. If you don’t have a dataset lying around, you can start from a Google Sheet that we prepared for this tutorial: Open the example spreadsheet and once in the Google Sheet, click “File > Make a copy,” specify a name, and click “Make a copy.” If you already have a spreadsheet with data you want to analyze, just open up that spreadsheet.
Regardless of whether you’re working with a duplicate of the example spreadsheet or your own spreadsheet, you need to give the Google service account that you created in the first step access to the spreadsheet:
- Click “Share”
- Add the email of the service account that you copied to your clipboard in the previous step as an editor to the spreadsheet (e.g. [email protected]<google-project-name>.iam.gserviceaccount.com)
- Click “Send”
Alright, service account: check. Google spreadsheet with the right access: check. It’s time to write some Python code!
3. Create a New Workspace
In this tutorial, we’ll be using DataCamp Workspace, a modern data science notebook with Python and all common data science packages preconfigured. A data project inside Workspace is called a workspace. Let’s create one!
You can either create an empty workspace from scratch, which is useful if you want to write all the code yourself or create a workspace that already has most of the code in this tutorial included.
Run and edit the code from this tutorial online
Open Workspace4. Store Your Service Account Credentials in Workspace
To use the service account credentials JSON in your newly created workspace, you need to store it in Workspace.
You could copy and paste the contents of the JSON inside a code cell of your newly created workspace:
This is not a secure approach and not recommended. If you share your notebook with someone else, they can see the credentials in clear text. They would then be able to impersonate you and get access to resources this service account has access to.
This approach would also require you to repeat this code in every workspace that you want to use to access spreadsheet data. If you want to rotate the credentials, you’ll have to go through and edit every workspace.
A more secure and scalable approach is creating a so-called environment variable integration to store your JSON secret. When you connect this integration to your workspace, all the secrets you specified in the integration will be available in the workspace as an environment variable that you can easily read with Python code. In your workspace:
- Click the “Integrations” tab on the left-hand side
- Click on “Create Integration”
- Select “Environment variables”
- In the “Add Environment Variables” modal:
- Set “Name” to GOOGLE_JSON
- Set “Value” to the full contents of the service account JSON file that was downloaded. You can do this by opening the JSON file, selecting all, copying it to your clipboard, and then pasting it in the Value field.
- Set the “Integration name” to “Google Service Account”
After filling in all fields, click “Create,” “Next,” and finally, “Connect.” Your workspace session will restart, and GOOGLE_JSON will now be available as an environment variable in your workspace. You can verify this by creating a Python cell with the following code and running it:
import os
google_json = os.environ["GOOGLE_JSON"]
print(google_json)
If you want to reuse the same services account credentials in another workspace, you don’t need to set up the integration again: you can connect to the same integration in multiple workspaces.
5. Read Google Sheets Data
Now for the fun part: writing Python code to connect to the Google Sheets data.
Install and import the right packages
Start by installing and importing all the packages required to set up a connection to your Google API account. We’ll be using google-auth to authenticate against the Google API, gspread to easily interface with Google spreadsheets, and pandas to manipulate tabular data in Python.
%%capture
!pip install gspread
from google.oauth2 import service_account
import pandas as pd
import gspread
import json
import os
Set up a client with scoped credentials
Load the GOOGLE_JSON environment variable and store it in a variable:
import os
google_json = os.environ["GOOGLE_JSON"]
Create a credentials object from the service account credentials json:
service_account_info = json.loads(google_json)
credentials = service_account.Credentials.from_service_account_info(service_account_info)
Assign the right ‘API scope’ to the credentials so that it has the right permissions to interface with the Google Sheet spreadsheet:
scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']
creds_with_scope = credentials.with_scopes(scope)
Create a new client with these scoped credentials:
client = gspread.authorize(creds)
Read spreadsheets data into a DataFrame
With the client ready, we can read the data from the spreadsheet created in the first section of this article into a pandas DataFrame. A pandas DataFrame is a table-like data structure that you can use in Python to store and manipulate data.
First, create a new sheet instance from a spreadsheet URL:
spreadsheet = client.open_by_url('https://docs.google.com/spreadsheets/d/spreadsheet_id/edit#gid=0')
Next, zoom in on the worksheet (the tab in the Google spreadsheet) you want to get the data from. Let’s get the first workspace using the index 0:
worksheet = spreadsheet.get_worksheet(0)
First, get all the records as a JSON:
records_data = worksheet.get_all_records()
Next, convert the JSON records into a pandas DataFrame:
records_df = pd.DataFrame.from_dict(records_data)
records_df
Success!
6. Analyze the Spreadsheets Data in Python
You can now use Python to analyze the data you read in from the Google spreadsheet. The minimal code below assumes that you are working with the example spreadsheet. If you are working with another dataset, these steps won’t work. Check out DataCamp’s Data Manipulation with pandas course if you want to dive deeper!
Let’s calculate median funding by country:
median_funding_df = records_df.groupby("country")["funding"].median().reset_index()
median_funding_df.columns = ['country', 'median_funding']
median_funding_df
7. Write the Results to Google Sheets
If you want the spreadsheet users you’re collaborating with to see the fruits of your labor in a spreadsheet, you can write back the results of your Python analysis back to a new Google worksheet (i.e., tab) inside the original spreadsheet.
In this example, we create a new worksheet median_country if it doesn’t exist yet, and then insert the contents of the median_funding_df DataFrame, including the header with column names. This code was generated with the help of the AI Assistant!
# Convert the median_funding_df DataFrame to a list of lists
medians_list = median_funding_df.values.tolist()
# Add the header row to the list
medians_list.insert(0, median_funding_df.columns.tolist())
# Check if the 'median_by_country' sheet already exists
sheet_titles = [sheet.title for sheet in spreadsheet.worksheets()]
sheet_name = "median_by_country"
if sheet_name not in sheet_titles:
# Create a new worksheet named 'median_by_country' in the Google Sheet
medians_sheet = spreadsheet.add_worksheet(title=sheet_name, rows=len(medians_list), cols=len(medians_list[0]))
else:
medians_sheet = spreadsheet.worksheet(sheet_name)
# Write the medians_list to the new worksheet
medians_sheet.clear()
medians_sheet.insert_rows(medians_list, row=1)
If you visit your spreadsheet again, you’ll see a new worksheet with the data from the DataFrame!
Conclusion
The initial setup before you can access data in Google Sheets with Python can take a while. However, once things are properly configured, accessing data from Google Sheets is almost as easy as querying a SQL database or reading in a CSV. The big advantage, though, is that you can meet your stakeholders where they are: it’s likely that some of your colleagues swear by spreadsheets.
Out with the times of laborious CSV exports and in with the times of powerful, automatic workflows that read from and write to spreadsheets without ever leaving DataCamp Workspace.
DataCamp Workspace
Skip the installation process, and get started with Python on your browser using DataCamp Workspace