Pivot Tables in Spreadsheets
In the world of big data, there are plenty of options for tools to work with; however,
pivot tables are one of the simplest and effective ways to organize and analyze your data, allowing you to gain valuable insights from the data.
At the most basic level, pivot tables can be created for any kind of data which is in a row and column format. The pivot table will read this data and will then allow you to filter, sort, subtotal, analyze your source data along with various other features that you could extract from the data.
According to Wikipedia, "A pivot table is a table of statistics that summarizes the data of a more extensive table (such as from a database, spreadsheet, or business intelligence program). This summary might include sums, averages, or other statistics, which the pivot table groups together in a meaningful way. Pivot tables are a technique in data processing. They enable a person to arrange and rearrange (or "pivot") statistics in order to draw attention to useful information".
Creating the pivot table is very easy since it requires only a few clicks of your mouse and then a few more to start organizing your data.
The pivot table contains four categories that you can drag the fields into to create a summarization of your data and gain more insights.
Below figure gives an excellent intuition about each attribute in the pivot table and how they help you to analyze and summarize your data in an effective manner.
About the Data!
You will start with a dataset that is well organized into a row and column format. This particular dataset shows the top 10 companies in the Fortune Global 500 in 2017 which is based on the highest gross revenues column.
You will be creating a pivot table from this dataset, which will allow you to analyze the data contained within.
Note: Please make sure you have a valid Gmail account, once you have an account all you need to do is click on this link and create a new blank spreadsheet.
The data is available here.
Create a Pivot Table!
There are two ways in which you can create a pivot table for this dataset.
Let's quickly understand the first way of creating a pivot table.
- Click on any of the cells within the dataset.
- Then click on the data tab and select Pivot Table button. This will allow the Pivot Table to select the dataset it believes you would intend to use.
- Then this will take you to the new worksheet tab, where the shell of the Pivot Table will be automatically added for you. On the left-hand side of your screen, you will see an outline showing where the Pivot Table will appear once it is built. The Pivot Table editor window will appear on the right-hand side and on the top of the pivot editor window you will see the data that is referenced.
Now is the correct time to verify the Pivot Table has included all of the intended data. In many cases, the Pivot Table will automatically select the entire dataset on your behalf, and there is nothing further you will need to do.
Let's quickly look at the second way of creating the Pivot Table:
- Here you will consider the complete dataset by manually selecting the entire dataset range including the headers, then select the Data button followed by Pivot Table. By choosing the dataset manually, you are bypassing "auto-select" capabilities of the pivot table; however, it will bring you to the new worksheet tab just like the previous method.
What if a row is completely missing from the Data?
If a row is completely missing from the Data, then you will have to go with the second way of creating a Pivot Table, i.e., select the complete dataset manually. If you select one cell in a dataset that has a missing row, then the Pivot Table will be created only till that row, and it will fail to take in to account the remaining rows of the dataset.
Note: If a row is not entirely missing and the dataset has multiple empty cells randomly dispersed throughout, you will still be able to create a Pivot Table by clicking anywhere within the dataset. Even though there is missing data, there will not be any empty rows or columns. So the pivot table will be able to select the full range automatically.
Organizing Rows and Columns of the Pivot Table!
So far you learned how to create a shell of the Pivot Table, now you will learn how to populate the Pivot Table with the Data. While there are many options to build your Pivot Table, the first choice is often selecting the rows and the columns.
You will start with the worksheet tab you created here, which shows the Pivot editor on the right-hand side and the blank shell of the Pivot Table within your worksheet.
- Firstly, you will select the category that will go in the row field. When you click on
Addnext to the row's section in the pivot table editor, you will see a dropdown list which will contain the column headers from your source data. Now when you select any of these categories, the pivot table will list all of the unique values found in that column of your source data. Let's choose
Industry, and the pivot table will show all of the industries contained in the original dataset.
Now, you have a list of Industries for the Top 10 companies in the world.
- Next, you will select the category for the columns field the same way. You will click
Addnext to the column's section in the pivot table editor, and in the dropdown menu, you will find the same list of column headers. Let's select
Year, and the pivot table will now show all of the years that are contained in the original dataset.
- You can select multiple rows or columns as well, by clicking
Addagain and selecting another category. In the rows section let's add the category
Nameso that the name of each company is listed underneath each industry. Also, make sure to re-arrange the order of the category in pivot table editor by dragging the
Namecategory above the
Well, you can see that the
Petroleum Refining industry is doing really well with four companies in the top ten!
Note: You can re-arrange the order of the rows by clicking and dragging the field names above or below each other. You can also click and drag them from rows to columns, or columns to rows. You can also remove a field by clicking on
X (cross) in the top-right corner of the field name!
Adding Values to Pivot Table!
Now that you know how to add rows and columns, the next step is to add values to your Pivot Table. The values that you select will make up the body of the Pivot Table.
For example: If your pivot table shows the industries of the top ten companies, you might wish to see the revenue for each.
The pivot table will then perform a calculation, such as
COUNT on your selected value, and it will populate the pivot table with the result of that calculation.
Let's see how that can be achieved in a few simple steps!
- You will click
Valuessection and select
Revenue. You will notice that when you select Revenue, it appears in your pivot table as SUM of Revenue which means that the pivot table is adding the revenue for every entry that falls under each industry. Let's look at the automotive industry which equals 495, you can think of this as the pivot table
Subtotalingall of the revenue for that industry. If you go back to the original source and add up all of the revenue for the automotive industry, it will equal the same total of 495.
Let's find out the
Profit for each industry!
- To find the
Profitfor each industry, all you have to do is click on add again, and select profit which will let you see multiple values at once. This will allow you to gain additional insight since the industry with highest revenues might not necessarily have the highest profit.
While rows and columns are often the first things you would want to select, the
Value field will often be the
primary consideration for your pivot table, as it contains the mathematical calculations that will help you analyze the data!
Filtering in Pivot Table
Like adding rows, columns, and values to your pivot table, you can also filter the data based on a condition to focus on a subset of the data and get some in-depth insights about it. Filtering in Pivot Table is similar to applying a
select query in
SQL wherein you select some portion of the data from the table based on a condition applied to a column.
In the pivot table, you will add
industryattribute as rows, and
revenueas the value.
Now comes the central part, in the
revenuecolumn and click on
showing all itemsand in that select
filter by a condition.In filter by condition, you will choose
greater than or equal to.
- You will put 240 as the value, which means that only the companies belonging to a specific industry will be shown which have a
sum of revenuegreater or equal to $240B. Your pivot table editor should look like the below image:
- Finally, let's find out which all companies have
revenueof greater or equal to $240B.
From the above figure, you can see that only a subset of data was selected as you filtered out the companies that I had revenue less than $240B.
This tutorial was a good introduction to Pivot Tables using Spreadsheets, if you were able to follow along easily, well done!
Try doing some experiments maybe with a more complex dataset, apply different attributes play with them and see if you can make some sense out of the data.
There is still a lot to cover, so why not take DataCamp's Pivot Tables with Spreadsheets course?