Hello guys been a while. As I was having my end Semester exams and a few side projects I have not been able to post regularly, but all of that is done and dusted so let's continue with the book.
Gradient Descent with multiple inputs
The case of using Gradient Descent for simple values is rather straightforward. Calculate the output, use the derivative to find the error, update the value, and try again. Rinse and Repeat until the minimum desired error is achieved.
Consider inputs of varying magnitudes, an input with a very high value will have a corresponding steeper slope. Any change in such an input will have a significantly larger impact on the network’s performance. A way to mitigate this from happening is to use normalization techniques so that the data gets constrained into more manageable and well-defined ranges. If a particular weight was frozen it was seen that it still produced minima, but when the gradient descent algorithm is used instead of the ‘point’ moving over the graph, the graph itself shifts towards the left, i.e. the negative direction, signifying that the point remains “stationary” but the function is getting transformed.
Also, another important thing such a case tells us is that the error is shared between the inputs.
This also helps us come to a rather dangerous conclusion, in the case where there are many inputs, if a less contributing input value becomes zero, ie reaches a minima,in the case of a shared error type if the system, the network would stop learning. This is again not something we need, there might be far greater contributing inputs that get shunned because of this act.
Gradient Descent with multiple outputs
Just apply the delta value for each input and then multiply with the input value and then get the weight_delta for the respective values.
Working with a real dataset
This part of the book deals with the MNIST database and it can be summarised as follows:
The MNIST dataset is a collection of 70,000 handwritten digits, each in a 28x28 pixel image. It is a popular dataset for training neural networks to recognize handwritten digits.
A neural network is a machine learning algorithm that can learn to perform tasks by analyzing data. It is made up of interconnected nodes, each of which performs a simple mathematical operation.
To train a neural network to recognize handwritten digits using the MNIST dataset, we would first need to preprocess the data. This involves flattening the images into vectors of 784 pixels, one for each pixel in the image.
Next, we would need to design a neural network architecture. The network should have 784 input nodes, one for each pixel in the input image. It should also have 10 output nodes, one for each possible digit (0-9).
We would then need to train the neural network using the MNIST dataset. This involves feeding the network the input images and comparing the predicted outputs to the actual outputs. The network will then adjust its weights to minimize the error.
Once the network is trained, it can be used to classify new handwritten digits. To do this, we would simply feed the network a new image and it will output the predicted digit.
Visualizing Weight Values
An interesting way to visualize how a neural network learns is to look at the weight values. The weight values represent the strength of the connections between the nodes in the network.
For example, if the weight between an input node and an output node is high, it means that the network believes that there is a strong correlation between that pixel and the digit that the output node represents.
If we take the weights for a particular output node and print them out as an image, we can see which pixels have the highest correlation with that digit.
Aniketh Vijesh
Third-year student @Amrita Vishwa Vidhyapeetam studying B.Tech CSE AI.
I work on medical AI, and build cool AI projects!