from ipypublish import nb_setup
Pattern Recognition is the task of classifying an image into one of several different categories. Since their inception, Pattern Recognition is the most common problem that NNs have been used for, and over the years the increase in classification accuracy has served as an indicator of the state of the art in NN design. The MNIST pattern recognition problem served as a benchmark for DLN systems until recently, and in this section we introduce DLNs by describing the problem and the way in which they are used to solve it.
#Mnist
nb_setup.images_hconcat(["DL_images/mnist.png"], width=600)
The MNIST database consists of 70,000 scanned images of handwritten digits from 0 to 9, a few of which are shown in Figure Mnist. Each image has been digitized into a $28 \times 28$ grid, with each of the 784 pixels in the grid assigned a quantized grayscale value between 0 and 1, with 0 representing white and 1 representing black. These numbers are then fed into a DLN as shown in Figure mnistNN. We will go into the details of the DLN architecture in Chapter NNDeepLearning, but at a high level the reader can observe that the DLN shown in Figure mnistNN has three layers of nodes (or neurons). The layer on the left is the Input Layer and consists of 784 neurons, with each neuron assigned a number with the grayscale value of the corresponding pixel from the input image. Note that the 2-dimensional image has been stretched out into a 1-demensional vector before being fed into the DLN. The middle layer is called the Hidden Layer and consists of 15 neurons, while the third layer is called the Output Layer and consists of 10 neurons. As the name implies, the Output Layer indicates the output of the DLN computation, so that ideally if the input corresponds to the digit $k$, $0 \leq k \leq 9$ , then the $k^{th}$ output neuron should be 1, and the other 9 should be zero.
#mnistNN
nb_setup.images_hconcat(["DL_images/mnistNN.png"], width=600)
The neurons in the Input and Hidden Layers are fully connected which implies that each neuron in the input layer is connected to every neuron in the Hidden Layer (and the same goes for the neurons between the Hidden and Output Layers). Note that these connections are uni-directional, i.e. exist in the forward direction only. Each of these connections is assigned a weight and each node in the Hidden and Output layers is assigned a bias, so that there are a total of $784 \times 15 + 15 \times 10 +15 +10 = 11,935$ weight + bias parameters needed to describe the network.
In an operational network, these 11,935 weight and bias parameters have to be set, so that the DLN is able to do its job. The process of choosing these parameters is known as "training the network" and uses a learning algorithm that proceeds by iteration. After the training is complete, the DLN should be able to classify images of digits that were not part of the training dataset, which is known as the networks "Generalization Ability".
The system operates as follows:
The 70,000 images in the MNIST database are divided into 3 groups: 50,000 images are used for training the DLN (called the training dataset), 10,000 images are used for choosing the model’s hyper-parameters (called the validation dataset) and the remaining 10,000 images are used for testing (called the test dataset).
The training process operates as follows:
The grayscale values for an image in the training data set are fed into the Input Layer. The signals generated by this propagate through the network, called forward propagation, and the values at the Output Layer are compared with the desired output (for example if the image is of the digit 2, then the desired output should be 0100000000). A measure of the difference between the desired and actual values is then fed back into the network and propagates back, using an algorithm known as Backprop. The information gleaned from this process is then used to modify all the link weights and node bias values, so that the desired and actual outputs are closer in the next iteration.
The process described above is repeated for each of the 50,000 images in the training set, which is known as a training Epoch. The network may be trained for multiple epochs until a stopping condition is satisfied, usually the error rate on the Validation data set should fall below some threshold.
Other than the weights and biases, there are some other important model parameters that are part of the training process, known as hyper-parameters. Some of these hyper-parameters are used to improve the optimization algorithm during training, while others are used to improve the model’s generalization ability. The main function of the validation dataset is to choose appropriate values for these hyper-parameters.
After the network is fully trained, the 10,000 images in the test dataset are used to test the DLN’s classification accuracy.
#epochAccuracy
nb_setup.images_hconcat(["DL_images/epochAccuracy.png"], width=1000)
Figure epochAccuracy plots the accuracy of the classification process as a function of the number of Epochs using the test data set. As can be seen, the classification accuracy increases almost linearly initially, but after about 260 Epochs, the classification accuracy does not increase beyond 82.25% or so (in other words the NN classifies about 8225 images correctly out of a total of 10,000 Test Images). The reasons why the testing accuracy plateaus out and what can be done to increase it further forms the subject of Section ImprovingModelGeneralization. In practice, the best accuracy that has been achieved by a state of the art NN on the MNIST classification problem is about 99.67%, i.e., only 33 mis-classifications out of 10,000!
Figure mnistNN also provides some insight into how the DLN is able to carry out the classification task, for the example in which the input is a handwritten zero. In a trained NN, four of the nodes in the Hidden layer are tuned to recognize the presence of dark pixels in certain parts of the image, as shown in the bottom of the figure. This is done by appropriately choosing the weights on the links between the Input and Hidden layers, which is also called filtering. As shown in the figure, the output node that corresponds to the digit 0, filters these 4 Hidden layer nodes (by setting the weights on the links between the Hidden and Output layers), such that its own output tends towards 1, while the outputs of the other nodes in the Output layer tend towards 0.
nb_setup.images_hconcat(["DL_images/ILSVRC.png"], width=600)
While the MNIST data set played an important role in the early years of Deep Learning, current systems have become powerful enough to be able to handle much more complex image classification tests. ImageNet is an on-line data set consisting of 16 million full color images obtained by crawling the web. These images have been labeled using Amazon’s Mechanical Turk service, and some example are given in Figure ILSVRC. A popular Machine Learning competition called ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) uses a 1.2 million subset of these images, drawn from 1,000 different categories, with 50,000 images used for validation and 150,000 for testing. In recent years the ILSVRC competition has served as a benchmark for the best DLN models. In recent years the performance of DLN models has exceeded that of a human test subject for this problem.
There are several frameworks that have emerged in the last few years for implementing DLN models. One of the most popular is Keras, which is built on top of an earlier framework called TensorFlow (both are from Google). We show an implementation of the MNIST classifier using Keras.
import keras
keras.__version__
from keras import models
from keras import layers
The MNIST dataset comes pre-loaded in Keras, and the next command imports it into the model. The raw MNIST images are in a format such as .png or .jpg, however these have been pre-processed by Keras into the Grayscale format, so that each pixel is an integer between 0 and 255. Furthermore all the images have been converted into a tensor format that can be processed using Keras. Also note that the images have been already split into training and test datasets.
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
Using the shape command, we obtain the dimensions of the tensor containing the training data. As shown below, the training dataset is a 3 dimensional tensor, with the first dimension representing the number of training images, and the next two dimensions representing the size of each image. Each image is a two dimensional tensor of size 28 x 28.
train_images.shape
The labels for the training data form an array of size 60,000. An element of this array is an integer label for the corresponding image.
len(train_labels)
train_labels
Similarly for test images:
test_images.shape
len(test_labels)
test_labels
We can also plot one of the training images by using the matplotlib commnd:
digit = train_images[4]
import matplotlib.pyplot as plt
import numpy as np
plt.imshow(digit, cmap = plt.cm.binary)
plt.show()
The contents of this image cal also be displayed in matrix form. Note that each pixel is represented in the Grayscale format by an integer in the range 0 and 255.
digit
The Grayscale formatted image tensors cannot be fed directly into the model, but need to be pre-processed, as follows:
The DLN model in Figure mnistNN can only accept input that is in the form of an 1-D array, or a vector. So we need to convery the 2-D tensors into a vector, which is done using the numpy reshape command.
The data being fed into the NN has to be normalized so that all numbers are around zero in magnitude, otherwise the training does not work very well. This is done by dividing each of the Grayscale formatted pixel values by 255, so that they lie in the range [0,1] after normalization.
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255
After re-shaping, the training dataset now consists of 60,000 images, each of which is a normalized vector of size 784.
train_images.shape
There are two ways in which the output label can be specified in Keras:
The following commands convert all the labels in the training and test datasets to the categorical format.
from tensorflow.keras.utils import to_categorical
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
train_labels
The NN model itself is specified using only 3 lines of Keras code.
Note that more layers can be added very simply to this model by repeating the command in Line 2.
network = models.Sequential()
network.add(layers.Dense(15, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))
The compile command uses the following arguments:
Specifying the Loss Function: The Loss Function is a measure of the difference between the output of the model vs the contents of the corresponding Label. The "Categorical Cross Entopy" Loss Function will be used. Note that if we had left the labels in the Sparse Categorical format, then the Loss Function "Sparse Categorical Cross Entropy" would be needed.
Specifying the Optimization Algorithm: In this case we will use 'sgd' which stands for Stochastic Gradient Descent. These algorithms are all based on Backprop, which is an efficient way of computing gradients.
Specifying the output metrics: This gives Keras the list of output metrics to collect. In this case we collect the "accuracy" metric, which is one of the Keras specified metrics. User defined metrics can also be specified here.
network.compile(optimizer='sgd',
loss='categorical_crossentropy',
metrics=['accuracy'])
We are finally ready to train the model, which is done using the fit command. The fit command is invoked using the following arguments:
Once the training starts, Keras provides a periodic update at the end of each epoch. This update contains the time it took to finish the epoch as well as the trailing and validation loss and accuracy values.
history = network.fit(train_images, train_labels, epochs=500, batch_size=128, validation_split=0.2)
The following command provides a list of performance data that Keras collected during the training. In this case we the training and validation loss and accuracy values, collected at the end of each training epoch.
history_dict = history.history
history_dict.keys()
We can plot the performance data collected during the training process using matplotlib. The plots shown below are typical for the loss and accuracy values as a function of the number of epochs. These plots are extremely important in interpreting and/or debugging the model, and in later chapters we will explain how this is done. By comparing the training and validation curves, we can figure how well the model is generalizing beyond the training dataset.
import matplotlib.pyplot as plt
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
#epochs = range(1, len(loss) + 1)
# "bo" is for "blue dot"
plt.plot(epochs, loss, 'bo', label='Training loss')
# b is for "solid blue line"
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
plt.clf() # clear figure
acc_values = history_dict['accuracy']
val_acc_values = history_dict['val_accuracy']
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
Once we are satisfied with the training results, we can run the test data through the model using the evaluate command.
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)
The summary command provides a useful overview of the model, it lists all the layers, the shape of the output tensor as well as the number of parameters per layer.
network.summary()