trekhleb / nano-neuron, Hacker News

7 simple JavaScript functions that will give you a feeling of how machines can actually “learn”.

(TL; DR)

NanoNeuronisOver-simplifiedversion of a Neuron concept from the Neural Networks. NanoNeuron is trained to convert a temperature values from Celsius to Fahrenheit.

NanoNeuron.jscode example contains 7 simple JavaScript functions ( model prediction, cost calculation, forward and backwards propagation, training) that will give you a feeling of how machines can actually “learn”. No 3rd-party libraries, no external data-sets and dependencies, only pure and simple JavaScript functions.

functionsThese functions by any means are (NOT) a complete guide to machine learning. A lot of machine learning concepts are skipped and over-simplified there! This simplification is done in purpose to give the reader a reallybasicunderstanding and feeling of how machines can learn and ultimately to make it possible for the reader to call it not a “machine learning MAGIC” but rather “machine learning MATH”🤓.

What NanoNeuron will learn

You’ve probably heard about Neurons in the context of (Neural Networks) . NanoNeuron that we’re going to implement below is kind of it but much simpler. For simplicity reasons we’re not even going to build a network on NanoNeurons. We will have it all by itself, alone, doing some magic predictions for us. Namely we will teach this one simple NanoNeuron to convert (predict) the temperature from Celsius to Fahrenheit.

By the way the formula for converting Celsius to Fahrenheit is this:

But for now our NanoNeuron doesn’t know about it …

NanoNeuron model

Let’s implement our NanoNeuron model function. It implements basic linear dependency between (x) andYwhich looks like (y=w * x b) . Simply saying our NanoNeuron is a “kid” that can draw the straight line in (XY) coordinates.

Variablesw, (B) are parameters of the model. NanoNeuron knows only about these two parameters of linear function. These parameters are something that NanoNeuron is going to “learn” during the training process.

The only thing that NanoNeuron can do is to imitate linear dependency. In its (predict)) method it accepts some inputxand predicts the outputY. No magic here.

functionNanoNeuron(w,  b) ) {   this.W=w;   this.  (b)=b;   this.predict=( (x) )=>{     returnx*this.wthis.  b) ;   } }

… … wait …linear regressionis it you?)🧐

Celsius to Fahrenheit conversion

The temperature value in Celsius can be converted to Fahrenheit using the following formula:F=1.8 * C 32, where (C) is a temperature in Celsius andfis calculated temperature in Fahrenheit.

functioncelsiusToFahrenheit( (C) ) {    (const)w=(1.8) ;    (const)B=32;    (const)f=(c)  *  (W)  b;   returnf; };

Ultimately we want to teach our NanoNeuron to imitate this function (to learn thatW=1.8and B=32) without knowing these parameters in advance.

This is how the Celsius to Fahrenheit conversion function looks like:

(Generating data-sets)

Before the training we need to generatetrainingandtest data-setsbased oncelsiusToFahrenheit ()function. Data-sets consist of pairs of input values and correctly labeled output values.

In real life in most of the cases this data would be rather collected than generated. For example we might have a set of images of hand-drawn numbers and corresponding set of numbers that explain what number is written on each picture.

We will use TRAINING examples data to train our NanoNeuron. Before our NanoNeuron will grow and will be able to make decisions by its own we need to teach it what is right and what is wrong using training examples.

We will use TEST examples to evaluate how well our NanoNeuron performs on the data that it didn’t see during the training. This is the point where we could see that our “kid” has grown and can make decisions on its own.

functiongenerateDataSets() {   //xTrain ->[0, 1, 2, ...],  //yTrain ->[32, 33.8, 35.6, ...]   (const)xTrain=[];    (const)yTrain=[];   for(let  (x)=(0) ; x100; x= (1) ) {      (const)Y=celsiusToFahrenheit(x);     xTrain.push(x);     yTrain.push(y);   }    //xTest ->[0.5, 1.5, 2.5, ...]  //yTest ->[32.9, 34.7, 36.5, ...]   (const)xTest=[];    (const)yTest=[];   //By starting from 0.5 and using the same step of 1 as we have used for training set  //we make sure that test set has different data comparing to training set.  for(let  (x)=(0.5) ; x100; x= (1) ) {      (const)Y=celsiusToFahrenheit(x);     xTest.push(x);     yTest.push(y);   }    return[xTrain, yTrain, xTest, yTest]; }

The cost (the error) of prediction

We need to have some metric that will show how close our model’s prediction to correct values. The calculation of the cost (the mistake) between the correct output value ofYandpredictionthat NanoNeuron made will be made using the following formula:

This is a simple difference between two values. The closer the values to each other the smaller the difference. We’re using power of (2) here just to get rid of negative numbers so that(1 - 2) ^ 2would be the same as(2 - 1) ^ 2. Division by (2) is happening just to simplify further backward propagation formula (see below).

The cost function in this case will be as simple as:

functionpredictionCost( (Y) ,Prediction) {   return(y-prediction)**(2)/(2) ;//ie ->235 6}

Forward propagation

To do forward propagation means to do a prediction for all training examples fromxTrainandyTraindata-sets and to calculate the average cost of those prediction along the way.

We just let our NanoNeuron say its opinion at this point, just ask him to guess how to convert the temperature. It might be stupidly wrong here. The average cost will show how wrong our model is right now. This cost value is really valuable since by changing the NanoNeuron parameterswandBand by doing the forward propagation again we will be able to evaluate if NanoNeuron became smarter or not after parameters changes.

The average cost will be calculated using the following formula:

Where (m) is a number of training examples (in our case is100).

Here is how we may implement it in code:

functionforwardPropagation(model,xTrain,yTrain) {    (const)M=xTrain.length;    (const)predictions=[];   letcost=(0) ;   for(let  (i)=(0) ; im; i=(1)  ) {      (const)prediction=nanoNeuron.  (predict)  (xTrain [i]);     Cost=predictionCost(yTrain [i], prediction);     predictions.push(prediction);   }   //We are interested in average cost.  Cost/=m;   return[predictions, cost]; }

Backward propagation

Now when we know how right or wrong our NanoNeuron’s predictions are (based on average cost at this point) what should we do to make predictions more precise?

The backward propagation is the answer to this question. Backward propagation is the process of evaluating the cost of prediction and adjusting the NanoNeuron’s parameters (w) and (b) so that next predictions would be more precise.

This is the place where machine learning looks like a magic 🧞‍♂️. The key concept here isderivativewhich show what step to take to get closer to the cost function minimum.

Remember, finding the minimum of a cost function is the ultimate goal of training process. If we will find such values ofwand (b) that our average cost function will be small it would mean that NanoNeuron model does really good and precise predictions.

Derivatives are big separate topic that we will not cover in this article.MathIsFunis a good resource to get a basic understanding of it.

One thing about derivatives that will help you to understand how backward propagation works is that derivative by its meaning is a tangent line to the function curve that points out the direction to the function minimum.

Image source:MathIsFun

For example on the plot above you see that if we’re at the point of(x=2, y=4)than the slope tells us to goleftand (down) to get to function minimum. Also notice that the bigger the slope the faster we should move to the minimum.

The derivatives of ouraverageCostfunction for parameterswand (b) looks like this:

Where (m) is a number of training examples (in our case is100).

You may read more about derivative rules and how to get a derivative of complex functions (here.

functionbackwardPropagation(predictions,xTrain,yTrain) {    (const)M=xTrain.length;   //At the beginning we don't know in which way our parameters 'w' and 'b' need to be changed.  //Therefore we're setting up the changing steps for each parameters to 0.  letdW=(0) ;   letdB=(0) ;   for(let  (i)=(0) ; im; i=(1)  ) {     dW=(yTrain [i]-predictions [i])*xTrain [i];     dB=yTrain [i]-predictions [i];   }   //We're interested in average deltas for each params.  dW/=m;   dB/=m;   return[dW, dB]; }

Training the model

Now we know how to evaluate the correctness of our model for all training set examples (forward propagation), we also know how to do small adjustments to parameters (W) and (b) of the NanoNeuron model (backward propagation) . But the issue is that if we will run forward propagation and then backward propagation only once it won’t be enough for our model to learn any laws / trends from the training data. You may compare it with attending a one day of elementary school for the kid. He / she should go to the school not once but day after day and year after year to learn something.

So we need to repeat forward and backward propagation for our model many times. That is exactly whattrainModel ()function does. it is like a “teacher” for our NanoNeuron model:

it will spend some time (epochs) with our yet slightly stupid NanoNeuron model and try to train / teach it,
it will use specific “books” (xTrainandyTrain (data-sets) for training,
it will push our kid to learn harder (faster) by using a learning rate parameteralpha

A few words about learning rate (alpha) . This is just a multiplier fordWanddBvalues we have calculated during the backward propagation. So, derivative pointed us out to the direction we need to take to find a minimum of the cost function ( (dW) and (dB) sign) and it also pointed us out how fast we need to go to that direction (dWand (dB) absolute value). Now we need to multiply those step sizes toalphajust to make our movement to the minimum faster or slower. Sometimes if we will use big value of (alpha) we might simple jump over the minimum and never find it.

The analogy with the teacher would be that the harder he pushes our “nano-kid” the faster our “nano-kid” will learn but if the teacher will push too hard the “kid” will have a nervous breakdown and won’t be able to learn anything 🤯.

Here is how we’re going to update our model’swand (b) params:

And here is out trainer function:

functiontrainModel({model, epochs, alpha, xTrain, yTrain}) {   //The is the history array of how NanoNeuron learns.   (const)costHistory=[];    //Let's start counting epochs.  for(letepoch=(0) ; epochepochs; epoch=(1)  ) {     //Forward propagation.     (const)  [predictions,cost]=forwardPropagation(model, xTrain, yTrain);     costHistory.push(cost);        //Backward propagation.     (const)  [dW,dB]=backwardPropagation(predictions, xTrain, yTrain);        //Adjust our NanoNeuron parameters to increase accuracy of our model predictions.    nanoNeuron.W=(alpha)  *dW;     nanoNeuron.  (b)=(alpha)  *dB;   }    returncostHistory; }

Putting all pieces together

Now let’s use the functions we have created above.

Let’s create our NanoNeuron model instance. At this moment NanoNeuron doesn’t know what values should be set for parameterswand (b) . So let’s set upwand (b) Randomly.

(const)w=Math.  (random)  ();//ie ->0. 9492(const)B=Math.  (random)  ();//ie ->0. 4570(const)nanoNeuron=(new)NanoNeuron(w, b);

Generate training and test data-sets.

(const)  [xTrain,yTrain,xTest,yTest]=(generateDataSets)  ();

Let’s train the model with small (0. 0005) steps during the70000epochs. You can play with these parameters, they are being defined empirically.

(const)epochs=70000;  (const)alpha=0. 0005;  (const)trainingCostHistory=trainModel({model:nanoNeuron, epochs, alpha, xTrain, yTrain});

Let’s check how the cost function was changing during the training. We’re expecting that the cost after the training should be much lower than before. This would mean that NanoNeuron got smarter. The opposite is also possible.

console.  (log)  ('Cost before the training:', trainingCostHistory [0]);//ie ->4694. 3335043console.log(''Cost after the training:'', trainingCostHistory [epochs-1]);//ie ->0. 0000024

This is how the training cost changes over the epochs. On the (x) axes is the epoch number x 1000.

Let’s take a look at NanoNeuron parameters to see what it has learned. We expect that NanoNeuron parameterswand (b) to be similar to ones we have incelsiusToFahrenheit () (function) (w=1.8) and (b=)) since our NanoNeuron tried to imitate it.

console.  (log)  ('NanoNeuron parameters:', {w:(nanoNeuron) .w, B:nanoNeuron.b});//ie ->{w: 1.8, b: 31. 99}

Evaluate our model accuracy for test data-set to see how well our NanoNeuron deals with new unknown data predictions. The cost of predictions on test sets is expected to be be close to the training cost. This would mean that NanoNeuron performs well on known and unknown data.

[testPredictions, testCost]=forwardPropagation( nanoNeuron, xTest, yTest);console.log(''Cost on new testing data:'', testCost);//ie ->0. 0000023

Now, since we see that our NanoNeuron “kid” has performed well in the “school” during the training and that he can convert Celsius to Fahrenheit temperatures correctly even for the data it hasn’t seen we can call it “smart” and ask him some questions. This was the ultimate goal of whole training process.

(const)tempInCelsius=70;  (const)customPrediction=nanoNeuron.Predict  (tempInCelsius);console.log(``NanoNeuron "thinks" that$ {tempInCelsius}° C in Fahrenheit is:`, customPrediction);//->158 .00 02console.log(''Correct answer is:',celsiusToFahrenheit(tempInCelsius));//->158

So close! As all the humans our NanoNeuron is good but not ideal:)

Happy learning to you!

How to launch NanoNeuron

You may clone the repository and run it locally:

git clone https://github.com/trekhleb/nano-neuron.git  (cd)  nano-neuron

Skipped machine learning concepts

The following machine learning concepts were skipped and simplified for simplicity of explanation.

Train / test sets splitting

Normally you have one big set of data. Depending on the number of examples in that set you may want to split it in proportion of 70 / 30 for train / test sets. The data in the set should be randomly shuffled before the split. If the number of examples is big (ie millions) then the split might happened in proportions that are closer to 90 / 10 or 95 / 5 for train / test data-sets.

The network brings the power

Normally you won’t notice the usage of just one standalone neuron. The power is in thenetworkof such neurons. Network might learn much more complex features. NanoNeuron alone looks more like a simplelinear regressionthan neural network.

Input normalization

Before the training it would be better tonormalize input values .

Vectorized implementation

For networks the vectorized (matrix) calculations work much faster thanforloops. Normally forward / backward propagation works much faster if it is implemented in vectorized form and calculated using, for example, (Numpy) ***************************** (Python library.)

Minimum of cost function

The cost function that we were using in this example is over-simplified. It should have (logarithmic components) . Changing the cost function will also change its derivatives so the back propagation step will also use different formulas.

Activation function

Normally the output of a neuron should be passed through activation function likeSigmoidorReLUor others.

(Read More)