Cours
Every machine learning model you train is solving an optimization problem - and what you’re actually trying to solve is called the objective function.
In simple terms, it's a mathematical function that measures how "good" a solution is. It takes a set of inputs and outputs a single score. The goal is always to find the values that maximize or minimize that score. You'll find objective functions at the core of everything from linear programming to deep learning. It’s one of those things you’ll see everywhere after you understand how they work.
In this article, I'll explain what objective functions are, how they differ from loss and cost functions, and how they're used in machine learning and optimization.
Looking for a deep dive into deep learning that’s still relevant in 2026? Enroll in our Deep Learning in Python course to build a PyTorch portfolio.
What Is an Objective Function?
An objective function is a mathematical function that evaluates how good a solution is.
You feed it a set of inputs - model parameters, decision variables - and it returns a single number. That number tells you how well your current solution performs. The higher (or lower) that number, the better (or worse) your solution.
While on topic, let’s discuss optimization in general.
It’s the process of finding the inputs that push that number in the right direction. If you're minimizing, you want the smallest possible value. If you're maximizing, you want the largest. Either way, the objective function is what you're measuring against.
In plain terms, think of it as a scoring system. Every candidate solution gets a score, and your job is to find the one with the best score.
Objective Function vs Loss Function vs Cost Function
These three terms get used interchangeably all the time - but they don't mean exactly the same thing.
The objective function is the broadest term. It's any function you're trying to maximize or minimize. It doesn't have to involve error or predictions at all - it just defines what "better" means for your problem.
A loss function measures the error for a single training example - how far off your model's prediction is from the actual value. Mean squared error for one data point, for instance, is a loss function.
A cost function aggregates the loss across your entire dataset, usually by averaging it. So the cost function is what you're actually minimizing during training - it summarizes model performance across all examples, not just one.
In practice, most ML frameworks and papers use these terms loosely. You'll see "loss" used where "cost" would be more precise, and "objective" used to mean all three.
The distinctions matter when you're reading research papers. Context tells you which one the author actually refers to.
For a more concrete comparison, see the table below:

Objective/Loss/Cost function comparison table
Objective Functions in Optimization
Every optimization problem has a goal and a set of limits.
The objective function defines the goal - what you're trying to maximize or minimize. The constraints define the limits - the boundaries your solution has to stay within. Together, they frame the problem.
Take a simple resource allocation example.
Say you're running a factory that produces two products, and you want to maximize profit. Your objective function captures total profit as a function of how many units of each product you make. Your constraints capture the limits - available raw materials, machine hours, labor capacity. The objective function tells you what to optimize and the constraints tell you what you're working with.
Linear programming is one of the most common settings where this is applicable. It's a method for optimizing a linear objective function subject to linear constraints. It's used everywhere, from logistics, scheduling, supply chain, to finance. The math is well understood, and solvers can handle problems with thousands of variables.
It’s important to note that the objective function doesn't change what constraints exist - it just tells the solver what to go after. If you change the objective function, you’ll get a completely different solution, even with the same constraints.
Objective Functions in Machine Learning
In machine learning, the objective function defines what your model is actually learning to do.
Every time you train a model, you're running an optimization algorithm (think gradient descent, Adam, RMSProp) that adjusts model parameters to minimize or maximize the objective function. The model doesn't know anything about your problem. It only knows the score the objective function gives it, and it tries to improve that score with each update.
This means your choice of objective function shapes the outcome. It’s a good idea to try a few to see which works best for your case.
Mean squared error (regression)
Mean Squared Error (MSE) is the default objective function for regression problems. It measures the average squared difference between your model's predictions and the actual target values.

MSE formula
Squaring the differences makes all errors positive and penalizes large errors more than small ones. A prediction that's off by 10 contributes 100 to the sum - not just 10. This makes MSE sensitive to outliers, which is something to watch for in messy real-world data.
import numpy as np
y_true = np.array([3.0, 5.0, 2.5, 7.0])
y_pred = np.array([2.8, 5.2, 2.0, 6.5])
mse = np.mean((y_true - y_pred) ** 2)
print(f"MSE: {mse:.4f}")

MSE output
Cross-entropy loss (classification)
Cross-Entropy Loss is the standard objective function for classification problems. It measures how far your model's predicted probability distribution is from the true class distribution.

Cross-Entropy loss function
If your model assigns a high probability to the correct class, the loss is low. If it's confident but wrong, the loss is high, and it punishes that. This is what pushes the model to predict the right class and to be confident about it.
import numpy as np
# True labels (one-hot encoded)
y_true = np.array([1, 0, 0])
# Model's predicted probabilities
y_pred = np.array([0.7, 0.2, 0.1])
cross_entropy = -np.sum(y_true * np.log(y_pred))
print(f"Cross-Entropy Loss: {cross_entropy:.4f}")

Cross-Entropy output
Log-likelihood
Log-likelihood is common in probabilistic and statistical models. The function is maximizing the probability that your model's parameters produced the data you observed.

Log-likelihood formula
You work with the log of the likelihood rather than the likelihood itself because it turns a product of probabilities into a sum, which is much easier to compute and optimize.
In practice, most frameworks minimize the negative log-likelihood (NLL) instead of maximizing the log-likelihood. It's the same thing - just flipped so gradient descent can work with it.
import numpy as np
from scipy.stats import norm
# Observed data
data = np.array([1.2, 2.3, 1.8, 2.1, 1.9])
# Assumed model parameters
mu, sigma = 2.0, 0.5
# Compute negative log-likelihood
nll = -np.sum(norm.logpdf(data, loc=mu, scale=sigma))
print(f"Negative Log-Likelihood: {nll:.4f}")

Log-likelihood output
Training is, at its core, just optimization. Each forward pass computes the value of the objective function. Each backward pass computes gradients. And each parameter update moves the model in the direction that improves the score.
Convex vs. Non-Convex Objective Functions
Not all objective functions are created equal. Their shape determines how hard they are to optimize and if you can trust the solution you find.
Linear objective functions
A linear objective function gives you a straight-line relationship between inputs and output. When you change any input by a fixed amount, the output changes by a fixed amount.
Linear functions are used in linear programming, where both the objective and the constraints are linear. This makes them the easiest class of objective functions to optimize, as solvers can reliably find the global optimum, even for large problems.
Nonlinear objective functions
A nonlinear objective function has a more complex relationship between inputs and output. Most real-world problems - and nearly all machine learning models - fall into this category.
MSE is nonlinear. Cross-entropy is nonlinear. Neural network loss surfaces are nonlinear. The added complexity lets these functions capture complex relationships in data, but it also makes optimization harder.
Convex versus non-convex functions
This is where things get interesting.
A convex function has a bowl-like shape. Any line segment drawn between two points on the curve sits above or on the curve. This guarantees that any local minimum is also the global minimum - meaning if your optimizer finds a bottom, it's the actual bottom.
A non-convex function has a more irregular shape - multiple valleys, plateaus, and saddle points. Optimizers can get stuck in a local minimum, a valley that looks like the bottom but isn't. Deep neural networks have highly non-convex loss surfaces, which is why training them requires careful tuning of learning rates, optimizers, and initialization.
Convex problems are solved exactly, and non-convex problems are solved approximately. The quality of your solution depends on your optimization strategy.
For visual types, here’s a comparison between linear, nonlinear, convex, and non-convex objective functions:

Objective function type comparison
How Objective Functions Are Optimized
Once you have an objective function, you need a way to minimize or maximize it. That's where optimization algorithms come in.
The most common approach is gradient descent. The idea is to compute the gradient of the objective function with respect to your model's parameters, then take a small step in the direction that reduces the value. From there, repeat until the value stops improving.
The gradient is just the derivative of the objective function.
It tells you the slope at your current position and which direction is "uphill." To minimize the function, you move in the opposite direction. To maximize it, you move with it.
This process is iterative, meaning that to get to the solution, you make a series of small updates, each one moving the parameters closer to the optimum. The size of each step is controlled by the learning rate. Too large of a value means you can overshoot the optimum, and too small means training takes longer.
In practice, most ML frameworks use variants of gradient descent that are faster and more stable than the basic version:
- Stochastic Gradient Descent (SGD) computes the gradient on a single random example per step instead of the full dataset
- Mini-batch gradient descent uses a small batch of examples per step - a middle ground between SGD and full-batch
- Adam adapts the learning rate for each parameter, which makes it more forgiving to tune
The objective function has to be differentiable, or at least mostly differentiable, for gradient-based methods to work. No derivative means no gradient, which means the optimizer has nothing to follow. That’s one key thing to remember.
Example of an Objective Function
Let's make this concrete with a simple linear regression problem.
Say you're predicting house prices based on square footage. You have a dataset of houses with known prices, and you want to fit a line through the data that minimizes prediction error. Your objective function is Mean Squared Error (MSE) for which you already know the formula.
The inputs are your model's parameters - the slope and intercept of the line. The output is a single number - the average squared error across all predictions. In this case, lower means better.
Here's what a potential Python implementation might look like:
import numpy as np
np.random.seed(42)
# Generate some fake housing data
square_footage = np.random.uniform(500, 3000, 100)
true_price = 150 * square_footage + 50000 + np.random.normal(0, 15000, 100)
def predict(x, slope, intercept):
return slope * x + intercept
def mse(y_true, y_pred):
return np.mean((y_true - y_pred) ** 2)
# Try two different parameter sets
slope_a, intercept_a = 150, 50000 # good fit
slope_b, intercept_b = 80, 100000 # bad fit
pred_a = predict(square_footage, slope_a, intercept_a)
pred_b = predict(square_footage, slope_b, intercept_b)
print(f"MSE (good fit): {mse(true_price, pred_a):,.2f}")
print(f"MSE (poor fit): {mse(true_price, pred_b):,.2f}")

MSE good versus bad fit
The first parameter set produces a much lower MSE, which means it fits the data better. That's exactly what the optimizer uses to decide which direction to move.
For visual types, consider the following image:

MSE example
You can see both fits against the data, and also the MSE surface across a range of slope values to see the objective function itself.
The left plot shows how the two parameter sets fit the data. The right plot shows the MSE surface across slope values - the objective function as a curve, with a clear minimum the optimizer is trying to find. Every step of gradient descent moves along this curve toward that minimum.
Constraints and Objective Functions
An objective function tells you what to optimize. Constraints tell you what you're allowed to do.
In most real problems, you can't just maximize or minimize as you wish. You're working within limits, such as a budget, a time window, or a physical capacity. These limits are called constraints, and they define the set of valid solutions your optimizer can choose from.
Take a manufacturing example.
Say you want to maximize profit across two product lines. Without constraints, the answer is to produce as much as possible. But you've got 500 hours of machine time and 1,000 units of raw material available. Those are your constraints. The objective function is the same (maximize profit), but the optimizer can only search within the region those constraints allow.
When you change the constraints, the optimal solution changes too, even if the objective function doesn't.
This structure of an objective function with a set of constraints is the foundation of constrained optimization. It's how linear programming works, how portfolio optimization works, and how many real-world planning problems are formulated.
Conclusion
Every optimization problem - whether you're training a neural network, allocating resources, or fitting a regression model - comes down to one thing: a function you're trying to minimize or maximize.
The objective function is that function. It defines what "better" means, it guides every parameter update, and it determines what your model actually learns. If you get it right, your model will be able to solve the problem you have. But if you get it wrong, it solves a completely different problem - often without any obvious error to warn you.
Choosing the right objective function is a design decision in data science, as it shapes everything that follows. As a practitioner, feel free to experiment - there are many objective functions to choose from.
Not sure where to start? Our Model Validation in Python and Hyperparameter Tuning in Python courses are both great places to start for beginner to intermediate data scientists.
FAQs
What is an objective function?
An objective function is a mathematical function that measures how good a solution is. You feed it a set of inputs, such as model parameters and decision variables, and it returns a single number. The goal is always to find the inputs that make that number as high or as low as possible.
What's the difference between an objective function and a loss function?
A loss function is a specific type of objective function used in machine learning to measure prediction error for a single training example. The objective function is the broader term - it can refer to any function you're minimizing or maximizing, not just prediction error. In practice, the two terms are often interchangeably used, but the distinction matters when you're reading research papers.
Where are objective functions used?
Objective functions appear in any problem that involves optimization - machine learning, linear programming, resource allocation, finance, logistics, and more. In machine learning, the objective function defines what the model learns during training. Outside of ML, it defines the goal in any constrained or unconstrained optimization problem.
Why does the choice of objective function matter in machine learning?
The objective function determines what your model is actually optimizing for during training. If you choose the wrong one, your model will minimize the wrong thing - and it'll do it well, which makes the problem harder to spot. For example, using MSE for a classification problem won't just give poor results, it'll give confidently poor results, which is worse.
Can an objective function have multiple minima?
Yes - and this is one of the core challenges in training deep learning models. Non-convex objective functions have multiple local minima, meaning an optimizer can get stuck in a valley that isn't the global best. This is why weight initialization, learning rate scheduling, and optimizer choice all matter - they affect whether your optimizer finds a good solution in the end.



