Skip to content


Training a Neural Network from Scratch with Gradient Descent in JavaScript

Deep Learning, Machine Learning, Neural Network, JavaScript4 min read


TL;DR In this part, you’ll implement a Neural Network and train it with an algorithm called Gradient Descent from scratch.

The problem you’re trying to solve is to predict the number of infected (with a novel virus) patients for the next day, based on historical data. You’ll train a tiny Neural Network to do it!

  • Run the complete source code on CodeSandbox

In this part, you’ll learn how to:

  • Measure the error of your model predictions
  • Implement a simple way to find good weights for your model
  • Implement Gradient Descent - efficient way to find good weight values

First, we need to set a goal that we want to achieve. Let’s formulate that!

Learning is error reduction

How good your model predictions are? This question is vitally important. It gives you a starting point upon which to improve.

The problem of finding good weight parameters transforms into driving an error measurement down to 0. Here’s one way to measure the error between the predictions and true values:

1const weight = 0.5
2const data = 3
4const prediction = data * weight
6const trueInfectedCount = 4
8const error = (prediction - trueInfectedCount) ** 2


We subtract the true value from the prediction and square the result. This is known as squared error. An error of 0 indicates that the prediction is perfect (the same as the true value).

But why the squaring? This ensures that the error is always non-negative and increases the error when there are larger deviations from the true values. And this is a good thing - we want to change our model more aggressively when it makes huge errors.

Learning by going up and down

You have a way to evaluate how good your Neural Network predictions are. How can you use that information to find good weights?

  • Simple
  • Inefficient
  • Impossible to predict exact value

Remember the “Guess the number” game you were playing as a kid? You need to find a number based on greater than or less than feedback.

We’ll choose a step value and calculate the error for going up or down with that step. We’ll take the direction in which the error is smaller.

This sounds simple enough. Let’s try it out:

1var weight = 0.5
2const data = 3
4const neuralNet = (data, weight) => data * weight
6const error = (prediction, trueValue) => (prediction - trueValue) ** 2
8const trueInfectedCount = 4
10const STEP_CHANGE = 0.05
12for (const i of Array(20).keys()) {
13 const prediction = neuralNet(data, weight)
15 const currentError = error(prediction, trueInfectedCount)
17 console.log(
18 `iteration ${i + 1} error: ${currentError} prediction: ${prediction}`
19 )
21 const upPrediction = neuralNet(data, weight + STEP_CHANGE)
22 const upError = error(upPrediction, trueInfectedCount)
24 const downPrediction = neuralNet(data, weight - STEP_CHANGE)
25 const downError = error(downPrediction, trueInfectedCount)
27 if (upError < downError) {
28 weight += STEP_CHANGE
29 } else {
30 weight -= STEP_CHANGE
31 }

1iteration 1 error: 6.25 prediction: 1.5
2iteration 2 error: 5.522499999999998 prediction: 1.6500000000000001
3iteration 3 error: 4.839999999999999 prediction: 1.8000000000000003
4iteration 4 error: 4.2025 prediction: 1.9500000000000004
5iteration 5 error: 3.609999999999998 prediction: 2.1000000000000005
7iteration 19 error: 0.009999999999999573 prediction: 3.900000000000002
8iteration 20 error: 0.0025000000000002486 prediction: 4.0500000000000025 5

Slowly but surely, this method gets the job done! Unfortunately, for practical purposes, this is way too slow. Why?

The step you take is of fixed size. No matter how far away you’re from the minimum of the function (where the error is 0), you still take the same step. This is slow and might cause you to overshoot (miss the minimum error).

We need a way to make the step size dynamic - larger when away from the error minimum and smaller when closeby. How can we do that?

Learning with Gradient Descent

You need to minimize the error and have a control over only one thing - the weight value(s). In what direction and how much should you change it?

Your goal is to find weight value(s) that move the error to (as close as possible) 0. We can only wiggle the weight and need to understand its relationship with the error. But we already know that:

1error = (data * weight - trueInfectedCount) ** 2

How can we use this relationship to move the error in the right direction (close to 0)?

Fortunately, the derivative of a function allows you to know in which direction and by how much to change a variable when you change another. What is a derivative of a function? Aren’t those things scary? I mean Calculus scary?

The derivative is just the slope at some point in the function. Don’t worry! We’ll dive deeper to understand what this means. Let’s have a look at the first derivative of the error function:

1errorPrime = 2 * data * (data * weight - trueInfectedCount)

You can get the first derivative by using standard derivative tables (or using an online derivative calculator such as this one). All looks great, let’s remove that constant 22 in front of the equation. It is not mathematically precise, but it will keep the results more or less the same:

1errorPrime = data * (data * weight - trueInfectedCount)

It might help you better understand the error function and its first derivative by graphing them. Our function is cubic and looks like this (move the mouse over the chart to see the slope):

The derivative simplifies any formula and lets you see which direction you should take to reduce the error further. You also get an idea of how far away you’re from the minimum (based on how steep the curve is).

The good thing is that when using Neural Network libraries (like TensorFlow.js), you won’t need to deal with derivatives. Those get calculated for you.

All right, you can use all that knowledge to implement gradient descent:

1var weight = 0.5
2const data = 3.0
3const trueInfectedCount = 4.0
5const neuralNet = (data, weight) => data * weight
7const error = (prediction, trueValue) => (prediction - trueValue) ** 2
9for (const i of Array(20).keys()) {
10 const prediction = neuralNet(data, weight)
12 const currentError = error(prediction, trueInfectedCount)
14 const errorPrime = (data * weight - trueInfectedCount) * data
16 weight -= errorPrime
18 console.log(
19 `iteration ${i + 1} error: ${currentError} prediction: ${prediction}`
20 )

1iteration 1 error: 6.25 prediction: 1.5
2iteration 2 error: 400 prediction: 24
3iteration 3 error: 25600 prediction: -156
4iteration 4 error: 1638400 prediction: 1284
5iteration 5 error: 104857600 prediction: -10236
6iteration 6 error: 6710886400 prediction: 81924
8iteration 18 error: 3.1691265005705735e+31 prediction: 5629499534213124
9iteration 19 error: 2.028240960365167e+33 prediction: -45035996273704960
10iteration 20 error: 1.2980742146337069e+35 prediction: 360287970189639700

Gradient descent iteratively adjusts the Neural Network weight using the magnitude and direction provided by the derivative of our error function.

But what about that output? Those predictions seem to be wrong, by a lot!

Looks like the weight updates are far too aggressive (large). The algorithm simply overshoots the bottom of the U-shaped error function and goes for the stars. Let’s address this issue next!

Slowing down the learning process

Our Neural Network learns way too fast for its own good. We’ll introduce another parameter α\alpha that controls how much our model should learn on each step. That way, we’ll cut down the huge updates (overshooting).

1var weight = 0.5
2const data = 3.0
3const trueInfectedCount = 4.0
4const ALPHA = 0.1
6const neuralNet = (data, weight) => data * weight
8const error = (prediction, trueValue) => (prediction - trueValue) ** 2
10for (const i of Array(20).keys()) {
11 const prediction = neuralNet(data, weight)
13 const currentError = error(prediction, trueInfectedCount)
15 const errorPrime = (data * weight - trueInfectedCount) * data
17 weight -= ALPHA * errorPrime
19 console.log(
20 `iteration ${i + 1} error: ${currentError} prediction: ${prediction}`
21 )

1iteration 1 error: 6.25 prediction: 1.5
2iteration 2 error: 0.0625 prediction: 3.75
3iteration 3 error: 0.0006250000000000178 prediction: 3.9749999999999996
4iteration 4 error: 0.0000062499999999997335 prediction: 3.9975
5iteration 5 error: 6.249999999993073e-8 prediction: 3.99975
7iteration 16 error: 4.930380657631324e-30 prediction: 3.999999999999998
8iteration 17 error: 0 prediction: 4
9iteration 18 error: 0 prediction: 4
10iteration 19 error: 0 prediction: 4
11iteration 20 error: 0 prediction: 4

This looks much better! Note that we no longer need to specify the direction or the amount of the update! Everything is taken care of thanks to the usage of derivatives!

But what about that α\alpha value? How can you come up with it? In Machine Learning lingo, this is a hyperparameter. All this means is that you’ll have to find a good value on your own (mostly by trial and error). Sad I know, but there are more sophisticated ways to handle this issue.


Great job! You have a tiny Neural Network for which you found good weight values! At least, it looks this way, given it makes a correct prediction.

In this part, you learned how to:

  • Measure the error of your model predictions
  • Implement a simple way to find good weights for your model
  • Implement Gradient Descent - efficient way to find good weight values

Yes, we looked at a toy example. In the real world, you (hopefully) have more data and need a more general way to find good weight values for your Neural Networks. We’ll have a look at Generalized Gradient Descent next!



Want to be a Machine Learning expert?

Join the weekly newsletter on Data Science, Deep Learning and Machine Learning in your inbox, curated by me! Chosen by 10,000+ Machine Learning practitioners. (There might be some exclusive content, too!)

You'll never get spam from me