CRM Sales Opportunities

An example of analysis using PostgreSQL with various joins, common table expressions, window functions and subqueries.

Follow-up visualizations were done in Python's matplotlib and seaborn.

import matplotlib.pyplot as plt
import seaborn as sns
!pip install squarify
import squarify

Dataset

The dataset used is CRM Sales Opportunities obtained from MavenAnalytics:

B2B sales pipeline data from a fictitious company that sells computer hardware, including information on accounts, products, sales teams, and sales opportunities.

The dataset consists of four main data tables (saved in csv format), and a data_dictionary table.

EXPLORATORY QUERIES

Structure of tables

The first few rows of each table are shown below:

DataFrameas

df3

variable

SELECT * FROM sales_pipeline.csv LIMIT 5;

DataFrameas

df4

variable

SELECT * FROM accounts.csv LIMIT 5;

DataFrameas

df5

variable

SELECT * FROM products.csv LIMIT 5;

DataFrameas

df6

variable

SELECT * FROM sales_teams.csv LIMIT 5;

Table sales_pipeline will be used as the fact table. Tables accounts, products and sales_teams will be the dimension tables. The fields acount, product and sales_agent respectively as the key fields.

As a side note, we can already see that in accounts, in the field sector, technology is misspelt as "technolgy". Since we are going to perform table join, we should check if the key fields are uniform between tables.

Range of sale dates

It should be checked if any of the sales agents had joined/left the organization throughtout the year, as it would limit how their efficiency can be calculated. We can use a subquery to find the first and last date for each agent, from which we get the minimum and maximum value of first and last dates.

DataFrameas

exp1

variable

SELECT 
	MIN(first_date) AS min_start_date,
	MAX(first_date) AS max_start_date,
	MIN(last_date) AS min_end_date,
	MAX(last_date) AS max_end_date
FROM 
	(SELECT 
		sales_agent, 
		MIN(close_date) AS first_date, 
		MAX(close_date) AS last_date
	FROM sales_pipeline.csv
	GROUP BY sales_agent) AS subquery;

For every sales agent appearing in the sales_pipeline table, the time frame begins between 2017-03-01 and 2017-03-10, and ends between 2017-12-25 and 2017-12-31. Therefore all the agents (and consequently, sales teams and regional offices) were active roughly throughout the analyzed time frame.

Field matches: product

The product field will be used as key value for performing table joining. We will use a query to check if values match between products and sales_pipelines tables:

DataFrameas

df2

variable

SELECT 
	DISTINCT p.product AS Products_table_names, 
	sp.product AS SalesPipeline_table_names 
FROM products.csv AS p
FULL JOIN sales_pipeline.csv AS sp
ON p.product = sp.product
WHERE Products_table_names IS null OR SalesPipeline_table_names IS null;

There is a consistent difference in spelling of GTX Pro between the 2 tables. The issue will be bypassed by removing all whitespaces from the product name in both tables when joining on the product field.

‌
‌
‌

CRM Sales Opportunities

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}CRM Sales Opportunities

Dataset

EXPLORATORY QUERIES

Structure of tables

Range of sale dates

Field matches: product

CRM Sales Opportunities