**
Inner Workings**
**(Skip to [Activation Functions](#activation-functions-section) to find out the indices for the activations)**
The neural network is a network of artificial neurons where each neuron does some calculations and calculates
the output according to the input. For each neuron, the neuron of the previous layer's value is taken and
multiplied by the weight associated with it stored in this neuron. Then, a bias of this neuron is added and an activation function provided for each layer is run.
$`$ a_n = g_n(z_n) $`$
$`$ z_n = b_n + \sum_n w_na_{(n-1)i}$`$
$`a_n = `$ the neuron's result
$`g_n = `$ the neuron's activation function
$`z_n = `$ the neuron's value
$`b_n = `$ the neuron's bias
$`w_n = `$ the neuron's weight associated with $`a_{n-1}`$
$`a_{(n-1)i} = `$ the previous neuron's value
---
Activation Functions
---
The current activation functions [ $`g(v)`$ ] are as follows:
- Linear (index = 0)
$`g(v) = v`$
$`g'(v) = 1`$
- Sigmoid (index = 1)
$`g(v) = \frac{v}{1 + e^{-v}}`$
$`g'(v) = g(v) * (1 - g(v))`$
- ReLU (index = 2)
$`g(v) = max(0, v)`$
$`
g'(v) =
\begin{cases}
0.0, & \text{if } v \leq 0 \\
1.0, & \text{if } v > 0
\end{cases}
`$
- Tanh (index = 3)
$`g(v) = \frac{e^v - e^{-v}}{e^v + e^{-v}}`$
$`g'(v) = 1 - g(v)^2`$
---
Learning
---
Learning involves changing the weights and biases of the neuron to an optimal value that can evaluate precise outputs.
Gradient descend is used to calculate the "right" values of the weights and biases.
A gradient of 0 ($`m = 0`$) is what the gradients of the weights and biases should approach.
For each neuron, an error term will be calculated which will be used later to calculate the gradient.
To calculate the bias:
$`$ b_n = b_n - α \times ε_n$`$
$`b_n =`$ the neuron's bias.
$`α =`$ the learning rate.
$`ε_n =`$ the neuron's error term, also the gradient of the bias.
To calculate the weight:
$`$ w_n = w_n - α \times ε_na_{(n-1)i}$`$
$`w_n =`$ the neuron's weight associated with $`ε_na_{(n-1)i}`$.
$`α =`$ the learning rate.
$`ε_n =`$ the neuron's error term, also the gradient of the bias.
$`a_{(n-1)i} = `$ the result of the neuron of the previous layer.
Error terms are different for the output layer and the hidden layers.
---
Methods of Learning
The current methods of learning are as follows:
- Mean Squared Error (MSE) using Back Propagation (BPG) ***learn_bpg_mse***
---
The Output Layer
---
The output neuron's error term is as follows.
$`$ ε_k = \frac{∂E}{∂b} = \sum_k^n (a_k - t_k)(g'_k(zk)) $`$
$`t_k =`$ expected value of the output neuron.
$`a_k =`$ the neuron's result.
$`g_k'(v) = `$ activation derived function.
$`z_k =`$ the neuron's value.
The weight's gradient is just the error term multiplied by the result.
$`$\frac{∂E}{∂w_j} = ε_k \times a_j$`$
$`ε_k = `$ the error term of the output neuron.
$`w_j = `$ the weight of which its gradient is to be calculated.
$`a_j = `$ the result of the previous neuron associated with $`w_j`$.
---
The Hidden Layer
---
Equation of the hidden layer bias is the utilization of a recursion where
all neurons that are connected with this hidden neuron are taken into account
during the calculation of its gradient.
An error term of j is then created.
$`$ ε_j = \frac{∂E}{∂b_j} = g'_j(zj) \times \sum_k^n ε_k w_{jk} $`$
$`g'_j =`$ activation derived function.
$`z_j =`$ this neuron's value.
$`ε_k =`$ error term of the output neuron.
$`w_{jk} =`$ the weight of the next neuron associated with this hidden neuron.
Hence, the weight gradient can be calculated with the same way as above.
$`$\frac{∂E}{∂w_{ij}} = ε_j * a_i$`$
$`wij = `$ the weight of which its gradient is to be calculated.
$`ai = `$ the result of the previous neuron.
$`εj = `$ the error term of this hidden layer.
------------------
For deeper neural networks, recursion is done.
$`$ε_j = g'_j(z_j) \times \sum_k ε_kw_{jk}$`$
$`$ε_i = g'_i(z_i) \times \sum_k ε_jw_{ij}$`$
$`$ε_h = g'_h(z_h) \times \sum_k ε_iw_{hi}$`$
and so on.