Pricing Optimizing in SQL Code-Along - April 1, 2025
The Business Problem
Businesses care a lot about the price of their products.
- When the products are priced too low, they lose tons of revenue.
- When the products are priced too high, they lose... tons of revenue.
That said, even though business leaders know that finding the correct price of a product is extremely important, they often have no analytical tools to figure out pricing optimization. Sometimes, leaders think it sounds too complex.
The Goal of the Code-along
By the end of this code-along, I guarantee you will be able to take any dataset with two features:
price
quantity_sold
and perform a powerful optimization technique in order to find the price that maximizes revenue.
Having the analytical skill to solve this optimization problem is going to have real results for your company, and it's sure to make you stand out as a data analyst or a data scientist. What is more, the technique is extremely versatile; it applies to all kinds of companies and industries, whether you are selling watches, tires, or shoes.
This code-along will also be useful if you are interested in creating your own project to demonstrate your understanding of SQL, statistics, algebra, calculus, as well as showing business acumen.
The questions to answer
- What is the price that you can sell your product for that maximizes total revenue?
- What revenue can you expect after you optimize the price?
The Parts of the Code-along
This project will have three parts:
Part 1: Performing linear regression in SQL
- The business case: We will build a model to describe quantity sold as a function of price.
- The SQL learning: We will also practice SQL aggregate functions and common table expressions (CTEs).
Part 2: Understanding price, quantity sold, and revenue
- The business case: We will then explore the use of a quadratic regression technique to describe revenue in terms of only price.
- The SQL learning: We will practice using SQL subqueries.
Part 3: Finding the price that maximizes revenue
- The business case: We will then learn how to find the derivative of our function, which is the theoretical price that maximizes revenue.
- The SQL learning: We will learn how to report the results of our optimization.
SQL Refresher
Before we dive in, let's start off with a refresher on some common SQL functions that we will be using in this tutorial.
Essential aggregate functions
COUNT(*)
β Total number of rows (used for N)SUM()
β Sum of values or expressions, e.g. SUM(price * quantity_sold)AVG()
β Mean of a column, needed for centering and intercept calculationSTDDEV()
β Standard deviation, used in slope computation via correlationCORR(x, y)
β Pearson correlation between two variables
Specialized linear regression functions (DuckDB)
REGR_SLOPE(y, x)
β Returns slope directlyREGR_INTERCEPT(y, x)
β Returns intercept directly
Other Tools
WITH
clause β Common Table Expressions (CTEs), used for organizing and reusing subqueriesAS
β Aliasing columns and subqueries for readability and reuse
Mathematical and Transform Functions
LN()
β Natural log (log base e), used for log-transforming quantity_soldEXP()
β Exponential function, used to "undo" the log and model predicted quantity
Our Dataset
This dataset is something I created myself. It has two columns:
price
: this column represents the price of the products soldquantity_sold
: the column represents the number of products sold at a given price
The dataset has 200 rows. Each row represents a day of sales showing how many products we sold for a given price.
Getting Started
Now, let's get started!
Part 0: Exploratory data analysis
The very first thing is exploring our dataset. We are interested in two variables: price
and quantity_sold
. So, let's start by choosing our columns with SELECT
and choosing our dataset with FROM
.
Looking at our table
As a first step, we can look at our demand_curve_data
table.
Instructions
- Use
SELECT
to chooseprice
andquantity_sold
from the dataset named"demand_curve_data.csv"
SELECT ROUND(price, 2), quantity_sold
FROM 'demand_curve_data.csv';
Data visualization
Data visualization is an important part of exploratory data analysis. And DataLab comes equipped with really great visuals that make data exploration easy. Here, I'm using DataLab's built-in UI to create a scatterplot, no code required, and this scatterplot, as you can see, is integrating perfectly into our SQL workflow.
This scatterplot let's us see two things, one obvious and one not obvious:
- There's an inverse relationship between
price
andquantity_sold
. - The relationship is not entirely linear, and the relationship between
quantity_sold
andprice
changes for different ranges ofprice
.
Instructions
- Use
SELECT
to chooseprice
andquantity_sold
from the dataset named"demand_curve_data.csv"
- Use DataLab to create a nice scatterplot
SELECT price, quantity_sold
FROM 'demand_curve_data.csv';
Part 1: Performing linear regression in SQL
Now that we have completed the first step, which is to look at the data and create a first visual, we can move on to creating a linear model. Here are the two parts:
- Step 1: The first step is to review the formulas for the slope and intercept of a simple linear regression.
- Step 2: The next step is to use the SQL aggregate functions to find the parts we will need so we can plug values into those equations.
Linear regression equations:
We can describe the slope and intercept in this way:
- The slope can be expressed at the correlation (
) multiplied by the quotient of the standard deviation of over the standard deviation of - The intercept can then be expressed as the average of
minus the product of the slope multiplied by the average of .
Here are those equations.
Personally, I rather like this set of equations because it shows you that, in simple linear regression, the slope is the correlation coefficient but back onto the scale of the data, and it also shows you that the regression line goes through the center of mass, which gives the regression its balancing property.
β
β