Skip to main content

Probability Mass Function: A Guide to Discrete Distributions

Learn how the probability mass function defines discrete probability distributions. Explore its properties, examples, and differences from probability density functions.
Jun 20, 2025  · 8 min read

In probability and statistics, the likelihood of an outcome for discrete random variables is quantified by the probability mass function (PMF), whereas for continuous variables, we use the probability density function (PDF).

Whether modeling coin flips, equipment failures, or user clicks on a webpage, the PMF helps us quantify the likelihood of different outcomes.

Understanding the Probability Mass Function

The probability mass function formula

Mathematically, the probability mass function of a discrete random variable X is defined as:

Where:

  • X is a discrete random variable,
  • x is a value that X can take,
  • p(x) is the probability that X equals x.

Two critical conditions must be met for a valid PMF:

1. Non-negativity

2. Total probability must equal one

These conditions ensure that all assigned probabilities make logical and mathematical sense. The PMF provides a complete description of the distribution of X, making it possible to compute expected values, variances, and other statistical measures.

Definition and key properties

The PMF has several defining properties that distinguish it:

  • Discreteness: The PMF only applies to discrete random variables, those that can take on countable values (like 0, 1, 2, ...).
  • Normalization: The sum of the PMF over all possible values of X must be exactly 1.
  • Individual probability assignment: Each possible value x is assigned a specific probability p(x), which captures how likely that particular outcome is.

For example, the PMF for rolling a fair six-sided die is:

Here, each value from 1 to 6 has an equal chance of appearing, and the sum of all probabilities is 1.

Probability Mass Function vs. Probability Density Function

Let’s understand in detail the differences between PMFs and PDFs and how they model the discrete and continuous distributions, respectively:

Characteristic

PMF

PDF

Variable Type

Discrete

Continuous

Value Output

Probability of exact outcome:

Probability density:

Summation Rule

PMF example

Let’s take a classic PMF example: tossing a fair coin (Bernoulli trial):

Let’s perform a Bernoulli trial in Python to understand this better.

import matplotlib.pyplot as plt
from scipy.stats import bernoulli
# Parameters
p = 0.5  # Probability of Heads
# PMF for a fair coin
x = [0, 1]  # 0 = Tails, 1 = Heads
pmf_values = bernoulli.pmf(x, p)

# Plotting
plt.bar(x, pmf_values, tick_label=['Tails (0)', 'Heads (1)'])
plt.title('PMF of a Fair Coin Toss (Bernoulli Trial)')
plt.ylabel('Probability')
plt.xlabel('Outcome')
plt.grid(axis='y', linestyle='--')
plt.show()

PDF example

Here’s a PDF example: measuring a person’s height (continuous variable).

The PDF might show that the probability density is highest around 170 cm, but the probability of any exact height is technically 0.

Again, let’s understand this with a code example.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Parameters
mu = 170   # mean height
sigma = 10  # standard deviation

# Continuous range of heights
x = np.linspace(130, 210, 500)

pdf_values = norm.pdf(x, mu, sigma)

# Plotting
plt.plot(x, pdf_values, label='PDF')
plt.title('PDF of Human Height (Normal Distribution)')
plt.xlabel('Height (cm)')
plt.ylabel('Probability Density')
plt.grid(True)
plt.axvline(mu, color='red', linestyle='--', label='Mean (170 cm)')
plt.legend()
plt.show()

While PMFs assign actual probabilities to specific outcomes, PDFs describe the relative likelihood of outcomes within an interval.

Common Discrete Distributions Using PMF

Common examples of discrete probability distributions are the Bernoulli, binomial, geometric, and Poisson distributions. Let's take a look at each.

PMF of the Bernoulli distribution

The Bernoulli distribution models a single experiment with only two possible outcomes: success (1) and failure (0).

Where p is the probability of success.

Here is an example: Tossing a fair coin where success = heads:

PMF of the binomial distribution

The Binomial distribution generalizes the Bernoulli to multiple trials:

Where:

  • n is the number of independent trials,
  • p is the probability of success in each trial,
  • x is the number of successes.

One example would be counting defective items in a batch of products.

PMF of the Geometric distribution

The Geometric distribution is used to model the number of failures before the first success in repeated independent Bernoulli trials:

One common example of this would be the number of coin flips until the first head appears.

PMF of the Poisson distribution

The Poisson distribution models the number of events in a fixed interval of time or space:

Where λ is the expected number of events per interval.

An example of this is the number of customer arrivals at a service center per hour.

Visualizing Probability Mass Functions

PMFs are visualized using bar charts, where:

  • The x-axis represents the possible values of the random variable.
  • The y-axis represents the probability assigned by the PMF.

Here is a table showing an example: rolling a fair die

X

1

2

3

4

5

6

P(X)

1/6

1/6

1/6

1/6

1/6

1/6

This uniform PMF would appear as a flat bar chart, where all outcomes are equally likely.

Such visualizations make it easier to:

  • Interpret distributions intuitively,
  • Compare relative probabilities,
  • Spot skewed or symmetric distributions.

Applications of Probability Mass Functions

PMFs are used across various domains:

Statistics

For calculating expected values:

  

and variances:

.

Machine learning

In classification algorithms (e.g., Naive Bayes), PMFs describe the likelihoods of categorical features. Probabilistic models like Hidden Markov Models also depend on PMFs for state transitions and emissions.

Reliability engineering

PMFs help determine the probability of a system or component failing at a specific time step.

Economics and finance

Modeling market scenarios or investment outcomes where the number of outcomes is finite and distinct.

Bayesian inference

PMFs serve as prior distributions in discrete Bayesian models. When new evidence is observed, the prior PMF is updated to a posterior PMF, making Bayesian updating tractable for discrete problems.

If you want to learn about how conditional probabilities work, check out our Conditional Probability: A Close Look tutorial.

Monte Carlo Simulations

PMFs are also essential in Monte Carlo simulations, especially for systems with discrete and probabilistic outcomes. These simulations use random sampling from a PMF to estimate a model's behavior or performance over many iterations.

Conclusion

The probability mass function provides the mathematical structure for modeling discrete random variables, unlike PDFs, which deal with continuous variables. They enable powerful modeling across domains, from basic coin tosses to complex machine learning systems.

Key takeaways:

  • PMFs are non-negative and sum to 1 across all possible values.
  • They define the shape and characteristics of discrete probability distributions.
  • PMFs are essential in various applications, from classical statistics to modern AI and finance.
  • In Bayesian inference and Monte Carlo methods, PMFs are central to updating beliefs and simulating uncertain systems.

For those diving deeper into probability theory, exploring cumulative distribution functions (CDFs) and expected value calculations will further enrich your understanding of how uncertainty can be quantified and leveraged in decision-making. Explore our courses:


Vidhi Chugh's photo
Author
Vidhi Chugh
LinkedIn

I am an AI Strategist and Ethicist working at the intersection of data science, product, and engineering to build scalable machine learning systems. Listed as one of the "Top 200 Business and Technology Innovators" in the world, I am on a mission to democratize machine learning and break the jargon for everyone to be a part of this transformation.

PMF FAQs

What is a probability mass function?

A PMF assigns probabilities to discrete outcomes of a random variable, ensuring all values sum to 1.

How does a PMF differ from a PDF?

A PMF applies to discrete variables, while a PDF applies to continuous variables.

Can a PMF be negative?

No, a PMF must always be non-negative.

What are common distributions that use PMFs?

Examples include the Bernoulli, Binomial, Geometric, and Poisson distributions.

How is a PMF used in real-world applications?

PMFs are widely used in machine learning, finance, reliability engineering, and statistical modeling.

Topics

Learn with DataCamp

Course

Mixture Models in R

4 hr
5K
Learn mixture models: a convenient and formal statistical framework for probabilistic clustering and classification.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Tutorial

Poisson Distribution: A Comprehensive Guide

The Poisson distribution models the probability of a certain number of events occurring within a fixed interval. See how it's applied in real-world scenarios like queueing theory and traffic modeling.
Vinod Chugani's photo

Vinod Chugani

9 min

Tutorial

Binomial Distribution: A Complete Guide with Examples

Learn how the binomial distribution models multiple binary outcomes and is used in fields like finance, healthcare, and machine learning.
Vinod Chugani's photo

Vinod Chugani

10 min

Tutorial

Probability Distributions in Python Tutorial

In this tutorial, you'll learn about and how to code in Python the probability distributions commonly referenced in machine learning literature.
DataCamp Team's photo

DataCamp Team

15 min

Tutorial

Understanding the Exponential Distribution: A Comprehensive Guide

Discover the fundamentals of the exponential distribution and its applications in real-world scenarios. Learn how to calculate probabilities and understand its significance in various fields. Explore practical examples and visualizations.
Vinod Chugani's photo

Vinod Chugani

9 min

Tutorial

Gaussian Distribution: A Comprehensive Guide

Uncover the significance of the Gaussian distribution, its relationship to the central limit theorem, and its real-world applications in machine learning and hypothesis testing.
Vinod Chugani's photo

Vinod Chugani

8 min

Tutorial

Bernoulli Distribution: A Complete Guide with Examples

Discover how the Bernoulli distribution captures binary outcomes and is applied in everything from coin flips to customer predictions.
Vinod Chugani's photo

Vinod Chugani

11 min

See MoreSee More