Sari la conținutul principal

Newton's Method: Find Roots Fast with Iterative Approximation

Newton's method is an iterative root-finding algorithm that uses tangent line approximations to close in on the solution of equations that have no closed-form answer.
15 apr. 2026  · 11 min. citire

Some equations just don't have a clean algebraic solution.

You can factor and substitute all you want, but some equations have no closed form. For example, a polynomial of degree five or higher has no general algebraic solution. Functions that mix exponentials with polynomials, like e^x = 3x, fall into the same category. You need a different approach in these cases.

Newton's method is that approach. It finds roots numerically by making smarter and smarter guesses - each one guided by the tangent line of the function at the current estimate.

In this article, I'll walk you through the formula behind Newton's method, how it works step by step, when it converges, and when it doesn't - with concrete examples to make the theory stick.

Looking for more math topics you need to know as a data scientist? Read our Geometric Series: Formula, Convergence, and Examples blog post to see how it applies to finance, physics, and CS.

What Is Newton’s Method?

Newton's method is an iterative technique for finding the roots of a function. The roots are the input values where the function equals zero.

You start this process with an initial guess. Then, the method uses the geometry of the function at that point to make a better guess. You repeat this process, and each iteration gets you closer to the actual root.

That's the whole idea. You just need a smart, repeatable update rule that converges on the answer.

The Newton's Method Formula

The core of Newton's method is a single update rule you repeatedly apply until you're close enough to the root.

Here's the formula:

Newton’s method formula

Newton’s method formula

Each iteration takes your current estimate x_n and produces a better one, x_{n+1}. You keep updating until the result is close enough to zero.

The formula has three components:

  • x_n - your current estimate of the root

  • f(x_n) - the function's value at that estimate

  • f'(x_n) - the derivative of the function at that estimate, which tells you the slope of the tangent line

If f(x_n) is large, you're far from the root. If f'(x_n) is steep, the function is changing fast, so you can take a bigger step. The ratio f(x_n) / f'(x_n) tells you exactly how far to move - and you subtract it from your current guess to get closer.

If f'(x_n) is zero or near zero, the formula won’t really work. You'd be dividing by zero, which means the method can't produce a next estimate. I'll cover this in more detail in the limitations section.

How Newton's Method Works

Newton's method follows the same four steps on every iteration.

  1. Choose an initial guess: Pick a starting value x_0 somewhere near the root. You don't need to be exact - just close enough that the function behaves predictably around that point. I'll cover what "close enough" means in the convergence section.

  2. Compute the function value: Evaluate f(x_0). This tells you how far the function is from zero at your current estimate. If f(x_0) = 0, you're done - you found the root.

  3. Compute the derivative: Evaluate f'(x_0). This gives you the slope of the function at x_0, which is the slope of the tangent line at that point.

  4. Update the estimate: Apply the update rule according to the formula from the previous section.

And you’re done!

This new value x_1 is where the tangent line crosses the x-axis. Geometrically, you're drawing a straight line that touches the curve at x_0 and following it down to zero. That intersection point is your next, better guess.

Then you repeat. Plug x_1 back into steps 2 through 4 to get x_2, then x_3, and so on. Each iteration draws a new tangent line at the updated point and finds where it crosses the x-axis.

The process stops when f(x_n) is close enough to zero - typically when it falls below some small threshold you define upfront.

Geometric Interpretation of Newton's Method

Picture a curve on a graph - that's your function f(x). The root is where the curve crosses the x-axis. You don't know where that crossing is yet, so you start with a guess x_0 somewhere on the x-axis.

At each step, you plot the point (x_0, f(x_0)) on the curve, then draw the tangent line at that point - a straight line that touches the curve there and follows its slope. That tangent line isn't horizontal. It's tilted, and if you follow it down, it crosses the x-axis at some point. That crossing is your next estimate, x_1.

Then you repeat. At x_1, you draw a new tangent line and find where it crosses the x-axis. That gives you x_2. Each tangent line is a local linear approximation of the curve, and each crossing point lands closer to the actual root.

The chart below shows two iterations of Newton's method applied to f(x) = x^2 - 2, starting from x_0 = 2.5:

Geometric interpretation chart

Geometric interpretation chart

This works because a tangent line is the best straight-line approximation of a curve at any given point. The closer you are to the root, the more the tangent line resembles the curve itself - and the more accurate your next step becomes.

In practice, the estimates don't just creep toward the root. They jump there fast, often doubling the number of correct decimal places with each iteration.

Step-by-Step Example of Newton's Method

Let's apply Newton's method to f(x) = x^2 - 2. The root of this function is x = sqrt(2) ≈ 1.4142 - in other words, we're computing the square root of 2.

The derivative is f'(x) = 2x, so the update rule becomes:

Example (update rule)

Example (update rule)

Let’s start with an initial guess of x_0 = 2.5.

Iteration 1:

Example (iteration 1)

Example (iteration 1)

Iteration 2:

Example (iteration 2)

Example (iteration 2)

Iteration 3:

Example (iteration 3)

Example (iteration 3)

After just three iterations, we're already accurate to four decimal places. The error dropped from 1.086 at x_0 to 0.0001 at x_3 - and it keeps reducing with each step.

Here’s how this estimate and error values work visually:

Visual overview of estimate and error

Visual overview of estimate and error

The left panel shows how each estimate gets closer to sqrt(2) ≈ 1.4142, while the right panel shows the error getting smaller on a log scale - each iteration roughly squaring the precision of the previous one.

Convergence of Newton's Method

Newton's method can converge fast, but only under the right conditions.

When your initial guess is close to the root and the function is smooth in that region, the method exhibits quadratic convergence. That's the technical term for what you saw in the example: each iteration roughly squares the error from the previous one. Two correct decimal places become four, four become eight, and so on.

Two conditions need to hold for this to work:

  • A good initial guess: The closer x_0 is to the actual root, the faster the method converges. If you start too far away, the tangent line at that point may send you in the wrong direction.
  • A well-behaved function: The function needs to be smooth and differentiable near the root. Sharp turns or flat regions can interfere with the tangent line approximation.

The most common failure mode is a derivative near zero. 

If f'(x_n) is close to zero, you're dividing by a very small number in the update rule, which sends the next estimate far from the root. In the worst case, f'(x_n) = 0 and the calculations stop working because you can’t divide by zero.

A poor starting point can also cause the method to oscillate or diverge. Instead of closing in on the root, the estimates jump back and forth or drift further away with each iteration.

Newton's method rewards good setup. A reasonable initial guess and a smooth function are all it needs to converge, and converge fast.

Advantages of Newton's Method

When conditions are right, Newton's method is hard to beat.

The biggest advantage is quadratic convergence. Most numerical methods close in on the root at a linear rate, meaning that each iteration reduces the error by a fixed amount. Newton's method squares the error instead, which means it gets accurate fast with very few iterations.

It's also general-purpose. You can apply it to a wide range of functions - polynomial, trigonometric, exponential - without changing anything. That's why it shows up across so many fields, from engineering simulations to training machine learning models.

Limitations of Newton's Method

Newton's method asks a lot in return for that speed. Here are a couple of limitations to keep in mind:

  • It requires a derivative: You need an analytical expression for f'(x) before you can run a single iteration. For functions where the derivative is hard to compute (or doesn't exist) you need a different approach.

  • It's sensitive to the initial guess: If you start too far from the root, the method can send you in the wrong direction.

  • It may not converge: If the function has flat regions or sharp curves, the tangent line approximation just doesn’t work.

  • It can diverge or oscillate: In bad cases, the estimates fail to converge and drift further from the root or indefinitely bounce back and forth.

So before you reach for Newton’s method, make sure you understand your function.

Newton's Method vs Other Root-Finding Methods

Newton's method isn't the only way to find roots, and it's not always the right one for you.

Two other methods often come up: the bisection method and the secant method. Let me briefly explain these.

Bisection method

The bisection method is the simplest of the three. You start with an interval [a, b] where the function changes sign - meaning a root must exist somewhere inside. Then you repeatedly cut the interval in half, keeping the half that still contains the sign change.

It works, but it's slow. The error reduces by half with each iteration, which is linear convergence. But it's also guaranteed to work as long as the function is continuous and your initial interval brackets a root. No derivatives required.

Secant method

The secant method is a close relative of Newton's method. Instead of analytically computing the derivative, it approximates it using two previous estimates:

Secant method formula

Secant method formula

This is a good approach when the derivative is hard to compute. You pay for it with convergence speed - the secant method is faster than bisection but slower than Newton's method.

Applications of Newton's Method

Newton's method shows up across science, engineering, and machine learning. Let me explain how exactly.

Numerically solving equations

The most direct application. When a function has no closed-form solution, Newton's method finds the root. This comes up constantly in scientific computing - think finding equilibrium points in chemical reactions or solving transcendental equations in signal processing.

Optimization

Finding the minimum or maximum of a function f(x) means finding where its derivative f'(x) = 0. That's a root-finding problem - which means Newton's method can be applied. You just run the algorithm on f'(x) instead of f(x), using the second derivative f''(x) in place of the first.

This variant is called Newton's method for optimization, and it converges faster than gradient descent on smooth, well-behaved functions.

Machine learning

In machine learning, training a model means minimizing a loss function. Newton's method and its variants show up in a couple places here.

L-BFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno) is a quasi-Newton optimizer that approximates the second derivative to avoid computing it directly. It's a standard choice for logistic regression and other convex problems. Newton's method is also the basis for the Newton-Raphson updates used in statistical model fitting, such as generalized linear models.

Physics and engineering

Newton's method is everywhere in simulation and design. Engineers use it to solve nonlinear systems of equations that describe physical systems - think structural stress analysis and fluid dynamics. In each case, the underlying problem reduces to finding where a set of equations equals zero.

Common Mistakes with Newton's Method

Most errors with Newton's method come down to the same four mistakes. Let me go through them:

  • Starting too far from the root: A poor initial guess is the most common reason the method diverges or oscillates. If you don't have a good intuition for where the root is, plot the function first. This will tell you where to start.

  • Getting the derivative wrong: The update rule depends on f'(x). An incorrect derivative - whether from a calculation error or a coding mistake - produces wrong estimates from the very first iteration, and the error compounds with iterations.

  • Not checking for division by zero. If f'(x_n) equals zero or gets very close to it, the update step can’t work. Add a guard in your implementation: if the derivative falls below some small threshold, stop and report the failure rather than producing a nonsense result.

  • Stopping too early. Cutting off the iterations before the estimate has converged leaves you with an answer that looks close but isn't. Set your stopping condition on the actual error - either |f(x_n)| or |x_{n+1} - x_n| falling below a threshold you've chosen deliberately, not just a fixed number of iterations.

Conclusion

Newton's method is one of the most useful tools in numerical computing. A single update rule, applied repeatedly, can find roots to arbitrary precision in just a couple of iterations.

You pay for that speed with conditions. You need a good initial guess, a non-flat function, a non-spiky function, and a non-zero derivative to achieve fast convergence. Just understand these conditions, and you'll know when to reach for Newton's method and when to use something else (like bisection or secant methods).

The best way to build that intuition is to practice on simple examples. Start with f(x) = x^2 - 2, try different starting points, and watch what happens. Move on to functions with multiple roots or flat regions and see where the method breaks down.

If you like the concept of optimization through iteration, you must know about gradient descent. Read our Gradient Descent in Machine Learning: A Deep Dive to learn how it optimizes models for machine learning.


Dario Radečić's photo
Author
Dario Radečić
LinkedIn
Senior Data Scientist based in Croatia. Top Tech Writer with over 700 articles published, generating more than 10M views. Book Author of Machine Learning Automation with TPOT.

FAQs

What is Newton's method used for?

Newton's method is a numerical technique for finding the roots of a function - the values of x where f(x) = 0. It's used across science, engineering, and machine learning whenever an equation has no clean algebraic solution. Common applications include solving nonlinear equations, fitting statistical models, and powering optimization algorithms like L-BFGS.

How many iterations does Newton's method need to converge?

It depends on the function and the initial guess, but Newton's method typically converges in very few iterations when conditions are right. Thanks to quadratic convergence, the number of correct decimal places roughly doubles with each step. In practice, just a couple of iterations is often enough to reach machine precision.

What happens if Newton's method doesn't converge?

If the initial guess is too far from the root, or if the function has a flat region near the starting point, the method can diverge or oscillate instead of converging. A derivative close to zero is a common cause - it sends the next estimate far off course. In these cases, switching to a more stable method like bisection, or improving the initial guess, usually fixes the problem.

What is the difference between Newton's method and the secant method?

Both methods use the same core update idea, but Newton's method requires the analytical derivative f'(x), while the secant method approximates it using two previous estimates. The secant method works well when the derivative is hard to compute, but it converges a bit slower than Newton's method.

What does quadratic convergence mean in Newton's method?

Quadratic convergence means the error at each iteration is roughly proportional to the square of the error from the previous iteration. In plain terms, if you have two correct decimal places, the next iteration gives you four, then eight, and so on. This is what makes Newton's method so fast compared to methods like bisection, which only cut the error in half each time.

Subiecte

Learn with DataCamp

course

Linear Classifiers in Python

4 oră
65.3K
In this course you will learn the details of linear classifiers like logistic regression and SVM.
Vezi detaliiRight Arrow
Începeți cursul
Vezi mai multRight Arrow
Înrudite

tutorial

Taylor Series: From Approximations to Optimization

Learn how polynomial approximations power gradient descent, XGBoost, and the functions your computer calculates every day.
Dario Radečić's photo

Dario Radečić

tutorial

Least Squares Method: How to Find the Best Fit Line

Use this method to make better predictions from real-world data. Learn how to minimize errors and find the most reliable trend line.
Amberle McKee's photo

Amberle McKee

tutorial

Cramer's Rule: A Direct Method for Solving Linear Systems

Learn how to use Cramer's rule to solve systems of linear equations through determinants, with practical examples.
Arunn Thevapalan's photo

Arunn Thevapalan

tutorial

Gaussian Elimination: A Method to Solve Systems of Equations

Learn the Gaussian elimination algorithm through step-by-step examples, code implementations, and practical applications in data science.
Arunn Thevapalan's photo

Arunn Thevapalan

tutorial

Polynomial Regression: From Straight Lines to Curves

Explore how polynomial regression helps model nonlinear relationships and improve prediction accuracy in real-world datasets.
Dario Radečić's photo

Dario Radečić

tutorial

Differential Equations: From Basics to ML Applications

A practical introduction to differential equations covering core types, classification, analytical and numerical solution methods, and their real-world role in gradient descent, regression, and time series modeling.
Dario Radečić's photo

Dario Radečić

Vezi mai multVezi mai mult