Published on

Artificial Neural Network

Authors

Introduction

Artificial Neural Networks are a type of machine learning algorithm inspired by the structure and function of the human brain. They are composed of a large number of interconnected processing nodes, called artificial neurons, which work together to solve complex problems.

The math behind ANNs is based on linear algebra and calculus, specifically matrix operations and optimization techniques. At its core, an ANN is a series of matrix multiplications and non-linear activation functions.

The input data is passed through the layers of the network, where it is multiplied by a set of weights, and then passed through an activation function. The output of each layer is then passed as input to the next layer, until the final output is produced.

The goal of training an ANN is to find the best set of weights for the network, so that it can accurately predict the output for a given input. This is done by minimizing the error between the predicted output and the true output, using optimization techniques such as gradient descent.

The backpropagation algorithm is used to calculate the gradient of the error with respect to the weights, which is then used to update the weights in the opposite direction of the gradient, so as to minimize the error.

The math behind ANNs involves matrix operations, non-linear activation functions, and optimization techniques to find the best set of weights for the network, so that it can accurately predict the output for a given input.

Application Example

Let’s take an example to see how an ANN1 works.

ObesityExerciseSmokingDiabetic
1001
0100
0010
1101

In the previous table, the value 1 represents true and the value 0 represents false. It appears that, in this example, a person who has diabetes is also inevitably obese. How can we create a program that utilizes these four examples and can predict whether a person has diabetes or not using other examples? This can be achieved by designing an Artificial Neural Network (ANN) with three input neurons that correspond to the Obesity, Exercise, and Smoking columns, and a single output neuron that corresponds to the Diabetic column.


_13
_13
input layer output layer
_13
┌───┐
_13
│ ├───────────────┐
_13
└───┘ │
_13
_13
┌───┐ ┌─▼─┐
_13
│ ├────────────►│ │
_13
└───┘ └─▲─┘
_13
_13
┌───┐ │
_13
│ ├───────────────┘
_13
└───┘

Neural Network Operation

The input values will be passed to the input neurons and the network will process them to determine if the person is diabetic or not. Each connection between neurons is controlled by a unique value called weight. Each neuron holds a specific value, calculated based on the previous neuron values and the weights. The ultimate goal of the neural network is to find the correct weights that will produce accurate predictions. To achieve this, the neural network will undergo several steps. Firstly, it will compute the value of the output layer's neurons, known as the prediction. Then, it will calculate the difference between the prediction and the actual output, and adjust the weights accordingly to increase the accuracy of the prediction. This process will be repeated until the network reaches an acceptable level of accuracy. Once the neural network is constructed, it can be used to predict whether a person has diabetes or not by inputting values such as 1, 1 and 1 (indicating that the person is obese, exercises and smokes), and the network will provide an answer indicating whether the person is diabetic or not.

A. Forward propagation

Forward Propagation is the process of passing input data through the network to produce the output. The input data is passed through the layers of the network, where it is multiplied by a set of weights and then passed through an activation function. The output of each layer is then passed as input to the next layer, until the final output is produced.

As previously mentioned, each neuron in an Artificial Neural Network holds a value that ranges between 0 and 1, which is calculated using the values of the previous neurons, the weights, and a bias term. This section will delve into the details of how these values are calculated.

Notation:

  • Weights wkw_k
  • Input layers: a1(l1),a2(l1),a3(l1)a^{(l-1)}_1,a^{(l-1)}_2,a^{(l-1)}_3
  • Output Layer: a2(l)a^{(l)}_2
  • Activation function aa
  • Layer number: ll

Let's create a notation, z1(l)z^{(l)}_1 which is the bias added to the dot product of weights wkw_k and values of the previous neurons ak(l1)a^{(l-1)}_k.

z1(l)=b1(l)+k=1n(l1)ak(l1)wk{z^{(l)}_1=b^{(l)}_1+\sum_{k=1}^{n_{(l-1)}} a^{(l-1)}_k w_k}

In order to keep the values in the neurons between 0 and 1, we are going to use the sigmoid function which is defined on [0,1], noted :

σ(x)=11+ex\sigma(x)=\frac{1}{1+e^{-x}}

Thus,

a1(l)=σ(z1(l))a1(l)[0,1]a^{(l)}_1=\sigma\left(z^{(l)}_1\right) \newline a^{(l)}_1 \in \lbrack {0,1} \rbrack

This calculus can be visualised with matrix

a1(l)=σ([w1w2w3][a1(l1)a2(l1)a3(l1)]+[b1(l)])a^{(l)}_1=\sigma\left(\begin{bmatrix}w_1 & w_2 & w_3\end{bmatrix}\begin{bmatrix}a^{(l-1)}_1 \\ a^{(l-1)}_2 \\ a^{(l-1)}_3\end{bmatrix}+\begin{bmatrix}b^{(l)}_1\end{bmatrix}\right)
B. Back propagation

Backpropagation is the process of adjusting the weights of the network in order to minimize the error between the predicted output and the true output. The backpropagation algorithm is used to calculate the gradient of the error with respect to the weights. This gradient is then used to update the weights in the opposite direction of the gradient, so as to minimize the error.

  1. Cost Function Let’s take a simple example of a one-input-layer-one-output-layer-neural-network.
a(l1)a(l) a^{(l-1)} \to a^{(l)}

Once again,

z(l)=b+a(l1)wa(l)=σ(z(l))z^{(l)}=b+a^{(l-1)}w \newline a^{(l)}=\sigma\left(z^{(l)}\right)

The goal is now to adjust the weights to make the prediction more accurate. Let’s introduce the cost function which calculates the square difference between the prediction and the actual output yy.

C1(a(l),y)=(a(l)y)2C_1\left(a^{(l)},y\right)=\left(a^{(l)}-y\right)^2

The accuracy of predictions made by an Artificial Neural Network (ANN) can be measured by the cost function. Mathematically, the goal is to minimize this function. The smaller the value of the cost function, the more accurate the predictions of the ANN are considered to be.

  1. Gradient descent

Now we will need to understand how sensitive the cost function is to small changes to wkw_k because remember from Neural Network Operation above, the goal is to adjust weights. Thus, we will determine the partial derivative of CC with respect to ww using the chain rule.

C1w=C1a(l)a(l)z(l)z(l)w{\frac{\partial C_1}{\partial w}=\frac{\partial C_1}{\partial a^{(l)}} \frac{\partial a^{(l)}}{\partial z^{(l)}} \frac{\partial z^{(l)}}{\partial w}}

Indeed,

C1a(l)=2(a(l)y)\frac{\partial C_1}{\partial a^{(l)}}=2\left(a^{(l)}-y\right)
a(l)z(l)=σ(z(l))z(l)=σ(z(l))\frac{\partial a^{(l)}}{\partial z^{(l)}}=\frac{\partial \sigma\left(z^{(l)}\right)}{\partial z^{(l)}}=\sigma\prime\left(z^{(l)}\right)
z(l)w=(b+a(l1)w)w=a(l1)\frac{\partial z^{(l)}}{\partial w}=\frac{\partial \left(b+a^{(l-1)}w\right)}{\partial w}=a^{(l-1)}

All together, it gives us

C1w=2(a(l)y)σ(z(l))a(l1)\frac{\partial C_1}{\partial w}=2\left(a^{(l)}-y\right) \sigma\prime\left(z^{(l)}\right) a^{(l-1)}

We will use this formula to calculate the adjustments to make to the weights multiple times until the predictions are accurate.

w=w+αC1ww=w+\alpha \frac{\partial C_1}{\partial w}

In summary, forward propagation is the process of passing input data through the network to produce the output, while backpropagation is the process of adjusting the weights of the network to minimize the error between the predicted output and the true output.

Footnotes

  1. Artificial Neural Network