Skip to content

CRM Sales Opportunities

An example of analysis using PostgreSQL with various joins, common table expressions, window functions and subqueries.

Follow-up visualizations were done in Python's matplotlib and seaborn.

import matplotlib.pyplot as plt
import seaborn as sns
!pip install squarify
import squarify

Dataset

The dataset used is CRM Sales Opportunities obtained from MavenAnalytics:

B2B sales pipeline data from a fictitious company that sells computer hardware, including information on accounts, products, sales teams, and sales opportunities.

The dataset consists of four main data tables (saved in csv format), and a data_dictionary table.

EXPLORATORY QUERIES

Structure of tables

The first few rows of each table are shown below:

Spinner
DataFrameas
df3
variable
SELECT * FROM sales_pipeline.csv LIMIT 5;
Spinner
DataFrameas
df4
variable
SELECT * FROM accounts.csv LIMIT 5;
Spinner
DataFrameas
df5
variable
SELECT * FROM products.csv LIMIT 5;
Spinner
DataFrameas
df6
variable
SELECT * FROM sales_teams.csv LIMIT 5;

Table sales_pipeline will be used as the fact table. Tables accounts, products and sales_teams will be the dimension tables. The fields acount, product and sales_agent respectively as the key fields.

As a side note, we can already see that in accounts, in the field sector, technology is misspelt as "technolgy". Since we are going to perform table join, we should check if the key fields are uniform between tables.

Range of sale dates

It should be checked if any of the sales agents had joined/left the organization throughtout the year, as it would limit how their efficiency can be calculated. We can use a subquery to find the first and last date for each agent, from which we get the minimum and maximum value of first and last dates.

Spinner
DataFrameas
exp1
variable
SELECT 
	MIN(first_date) AS min_start_date,
	MAX(first_date) AS max_start_date,
	MIN(last_date) AS min_end_date,
	MAX(last_date) AS max_end_date
FROM 
	(SELECT 
		sales_agent, 
		MIN(close_date) AS first_date, 
		MAX(close_date) AS last_date
	FROM sales_pipeline.csv
	GROUP BY sales_agent) AS subquery;

For every sales agent appearing in the sales_pipeline table, the time frame begins between 2017-03-01 and 2017-03-10, and ends between 2017-12-25 and 2017-12-31. Therefore all the agents (and consequently, sales teams and regional offices) were active roughly throughout the analyzed time frame.

Field matches: product

The product field will be used as key value for performing table joining. We will use a query to check if values match between products and sales_pipelines tables:

Spinner
DataFrameas
df2
variable
SELECT 
	DISTINCT p.product AS Products_table_names, 
	sp.product AS SalesPipeline_table_names 
FROM products.csv AS p
FULL JOIN sales_pipeline.csv AS sp
ON p.product = sp.product
WHERE Products_table_names IS null OR SalesPipeline_table_names IS null;

There is a consistent difference in spelling of GTX Pro between the 2 tables. The issue will be bypassed by removing all whitespaces from the product name in both tables when joining on the product field.