CRM Sales Opportunities
An example of analysis using PostgreSQL with various joins, common table expressions, window functions and subqueries.
Follow-up visualizations were done in Python's matplotlib and seaborn.
import matplotlib.pyplot as plt
import seaborn as sns
!pip install squarify
import squarifyDataset
The dataset used is CRM Sales Opportunities obtained from MavenAnalytics:
B2B sales pipeline data from a fictitious company that sells computer hardware, including information on accounts, products, sales teams, and sales opportunities.
The dataset consists of four main data tables (saved in csv format), and a data_dictionary table.
EXPLORATORY QUERIES
Structure of tables
The first few rows of each table are shown below:
SELECT * FROM sales_pipeline.csv LIMIT 5;SELECT * FROM accounts.csv LIMIT 5;SELECT * FROM products.csv LIMIT 5;SELECT * FROM sales_teams.csv LIMIT 5;Table sales_pipeline will be used as the fact table. Tables accounts, products and sales_teams will be the dimension tables. The fields acount, product and sales_agent respectively as the key fields.
As a side note, we can already see that in accounts, in the field sector, technology is misspelt as "technolgy". Since we are going to perform table join, we should check if the key fields are uniform between tables.
Range of sale dates
It should be checked if any of the sales agents had joined/left the organization throughtout the year, as it would limit how their efficiency can be calculated. We can use a subquery to find the first and last date for each agent, from which we get the minimum and maximum value of first and last dates.
SELECT
MIN(first_date) AS min_start_date,
MAX(first_date) AS max_start_date,
MIN(last_date) AS min_end_date,
MAX(last_date) AS max_end_date
FROM
(SELECT
sales_agent,
MIN(close_date) AS first_date,
MAX(close_date) AS last_date
FROM sales_pipeline.csv
GROUP BY sales_agent) AS subquery;For every sales agent appearing in the sales_pipeline table, the time frame begins between 2017-03-01 and 2017-03-10, and ends between 2017-12-25 and 2017-12-31. Therefore all the agents (and consequently, sales teams and regional offices) were active roughly throughout the analyzed time frame.
Field matches: product
The product field will be used as key value for performing table joining. We will use a query to check if values match between products and sales_pipelines tables:
SELECT
DISTINCT p.product AS Products_table_names,
sp.product AS SalesPipeline_table_names
FROM products.csv AS p
FULL JOIN sales_pipeline.csv AS sp
ON p.product = sp.product
WHERE Products_table_names IS null OR SalesPipeline_table_names IS null;There is a consistent difference in spelling of GTX Pro between the 2 tables. The issue will be bypassed by removing all whitespaces from the product name in both tables when joining on the product field.