LogoLogo

Product Bytes ✨

Logo
LogoLogo

Product Bytes ✨

Logo

Understanding the way our brain works - Introduction to forward forward pass

Jan 12, 20233 minute read

Our brains are one of the most complex things createdAtAt by evolution. And as humans, we have always been fascinated by them. One of the ways in which we are trying to apply the working of the brain is through neural networks. Neural networks are computing systems that have been inspired by biological neural networks that are present in our brains. There have been many mathematical models that try to explain how our brain works, and further apply them on neural networks. One such algorithm is backpropagation which has been widely used. But it has its own set of challenges such as time complexity, power consumption and so on. So a new algorithm to replace back propagation has been proposed, the forward forward pass. Here we will discuss more about the same

Neural networks

Neural networks are not a new concept. They were first proposed in 1943 by Warren McCullough and Walter Pitts. There have been periods of its promotion and downfall but it wasn’t until the 1980s that they got much attention. And their use has increased dramatically in recent years due to the advancements in graphics processing. What made neural networks popular was the popularisation of backpropagation that was proposed in 1986 in the paper Learning Representations by Back-Propagating Errors. But what exactly are neural networks? A simple neural network is an interconnection of processing nodes. Each such node is called a perceptron or an artificial neuron. 

 

Fig. 1. A perceptron model that takes in 3 inputs and performs processing on them(Image credits :https://towardsdatascience.com/what-is-a-perceptron-210a50190c3b)

 

So it can be said that perceptrons are the building blocks of a neural network. A perceptron takes in various inputs, combines it with its internal weights, applies an activation function and returns either a 0 or 1, similar to the activation or non-activation of a neuron in the brain. 

As the field has progressed, there have been various different ways the perceptron can give its outputs, ranging between (-1, 1) by using a variety of activation functions like tanh, sigmoid. Mathematically, a single perceptron can be defined as following : 

Where x is in the input vector, w is the corresponding weight vector and b is the bias.

These weights and biases are randomly initialised and then updated throughout the training. But a question arises now, how are they updated, that's where the learning algorithm comes in.

Backpropagation

Backpropagation basically computes the gradients of a loss function and then updates the weights accordingly. But first let's cover up what has been done until now. We create a prediction from the randomly initialised weights and biases. Let's call that y and let's call the original prediction to be y. Let's have a learning rate , which is basically on how fast should the weights update:

y = w.x + b

Now there are many loss functions, but for simplicity we will choose the L1 loss function. Lets denote the loss function as L

L = y - y

Now the backpropagation basically computes the derivatives of this Loss, with respect to the weights and biases

w = L/w

b = L/b

And the weights and biases are updated as 

w = w -  .w

b = b -  .b

This is a basic working of backpropagation on a single perceptron. The same concept is used for huge neural networks by using the chain derivatives rule. In huge networks, we follow two steps. First step is to calculate the prediction of each neuron in the network and pass its outputs as inputs to the next neuron as so on. This is known as the forward pass. And the second step is to do a backward pass, or compute the loss and the gradients at each neuron and use those to update the weights and biases. This is done for a number of training examples which as a whole constitutes the training of a neural network. Although it has been very successful in solving problems, it comes with its own set of challenges 

Problems with back propagation

The problems with neural networks learning through this procedure has the following problems : 

  1. Fixed problem solving : Unlike our brain which is capable of solving multiple kinds of problems, a neural network is only good for solving just one problem. It means that the weights are fixed and if we train it to solve another problem, the information stored earlier is lost. 
  2. Computational time : While training, each example has to first go through a forward pass and then a backward pass, during which no other example can be trained. This is in contrast to our brains which can process information simultaneously. 
  3. Gradient accumulations : The gradients of each neuron have to be stored along with the weights, which can become very huge for bigger neural networks.
  4. Gradient calculation : Backpropagation cannot work if the gradients are not differentiable.

Forward forward pass: 

In a new paper presented by Hilton, he introduces the forward forward algorithm, inspired by how our brains work. The ideology behind this algorithm is to replace the forward and backward pass with forward passes. These two forward passes however have different objectives. The first pass wants to increase the goodness of the network while the second pass is a negative pass which wants to decrease the goodness of the network. This has a couple of benefits. The first being that now a network can be trained simultaneously, as there is no need for gradients. Secondly, there are no gradients being calculated which saves a lot of computational time. 

The goodness function for a layer is simply the sum of the squares of the activities of the rectified linear neurons in that layer. The aim of this learning is to have a threshold, the real data  will have goodness above this threshold, and the negative data will have goodness below the threshold. More specifically, the aim is to correctly classify input vectors as positive data or negative data when the probability that an input vector is positive (i. e. real) is given by applying the logistic function, σ to the goodness, minus some threshold,

where yj is the activity of hidden unit j before layer normalisation. The negative data may be predicted by the neural net using top-down connections, or it may be supplied externally.

The initial experimentation with this algorithm has been successful, with forward forward pass reaching almost the same results as reached by Backpropagation on popular datasets like MNIST, CIFAR. It will be interesting to see how this algorithm progresses in the future and how it helps us in reaching true Artificial Neural Networks.