Recurrent Neural Networks

In [1]:
from ipypublish import nb_setup

Introduction

Recurrent Neural Nets or RNNs are systems that are designed to detect patterns present in data sequences. This makes them better suited to solve "prediction" problems, as compared to other types of DLNs. An example of a prediction problem is predicting the next word in a sentence, which is a fundamental problem in Language Modeling. The solution to this problem requires that the system take into account the variable number of words that came before, i.e., it should be able to "remember" previous data in the sequence. Just like ConvNets, RNNs were discovered in the late 1980s, but lay dormant until recently due to the difficulty in training them. These problems have been overcome in recent years, with the use of a type of RNN called Long Short Term Memory or LSTMs, as well as the increase in processing power and size of training datasets. Today RNNs are at the forefront of exciting new discoveries in Deep Learning, and some of the most important recent work in DLNs falls in the RNN domain.

What are Recurrent Neural Networks

In [2]:
#rnn36
nb_setup.images_hconcat(["DL_images/rnn36.png"], width=600)
Out[2]:

In order to motivate the need for RNNs, consider the following: The DLN architectures that we have seen so far implements a static mapping between input and output vectors, of the type $Y=f(X)$ as shown in Part (a) of Figure rnn36.

Instead, consider a system that is evolving with time, i.e., it is a Dynamical System. In this scenario, the system is subject to an input sequence $X_1,...,X_n$ in which successive values of $X_i$ are dependent (for example if each $X_i$ were an image, then the sequence represents a video clip). Furthermore the output vector $Y_i$ at time $i$ depends not just on the input vector $X_i$, but on past values of $X$ as well, i.e., $Y_i = f_i(X_1,...,X_i)$ as shown in Part (b) of Figure rnn36. If we were to try to solve this problem using a traditional Neural Network, then we would need to find a different function $f_i$ for each value of $i$, i.e., the function would depend on the size of the input sequence. Obviously this would be a formidable task, and it would be much nicer if we could find a solution which is not dependent on the size of the input sequence; this is precisely what RNNs do.

Systems whose behavior is time dependent are commonly encountered in practice and are usually studied by postulating a Hidden Variable or Hidden State $Z_i$ (also called a State Variable) The evolution of the system state obeys the following recursion:

$$ Z_{i+1} = w_{i+1}(Z_i,X_i) $$

while the output is a function of the current state and is given by

$$ Y_{i+1} = v_{i+1}(Z_{i+1}) $$

The Hidden Variable sequence $Z_i$ captures the lossy history of the $X_i$ sequence, and hence serves as a type of memory. If we assume that the functions $v$ and $w$ do not change with time, then these equations reduce to:

$$ \begin{equation} Z_{i+1} = w(Z_i,X_i) \quad \quad (**eqn1**) \end{equation} $$$$ \begin{equation} Y_{i+1} = v(Z_{i+1}) \quad \quad (**eqn2**) \end{equation} $$

Traditional Systems Theory makes the additional simplifying assumption that these functions are linear so that

$$ Z_{i+1} = WZ_i + UX_i $$$$ Y_{i+1} = VZ_{i+1} $$

The use of DLNs enables us to approximate non-linear functions to high degree of accuracy using the training dataset, so that this linearity assumption is no longer needed. We do however assume that the vectors $Z_i$ and $X_i$ can be combined linearly, so that the resulting equations are:

$$ \begin{equation} Z_{i+1} = f(W Z_i + UX_i) \quad \quad (**eqn**3) \end{equation} $$$$ \begin{equation} Y_{i+1} = h(V Z_{i+1}) \quad \quad (**eqn4**) \end{equation} $$

Equations (eqn3) and (eqn4) are in a form in which they can be implemented using Neural Networks, with the functions $f$ and $h$ serving as the Activation Functions, as shown in Figure rnn1.

In [3]:
#rnn1
nb_setup.images_hconcat(["DL_images/rnn1.png"], width=600)
Out[3]:

The figure shows three types of nodes:

$X{(n)}$: This represents the value of the input vector at time $n$.

$Z{(n)}$: This represents the value of the Hidden Layer vector at time $n$.

$Y{(n)}$: This represents the value of the output vector at time $n$.

The LHS of Figure rnn1 shows a RNN with connections between the Hidden Layers at times $n$ and $n+1$ shown explicitly. The RHS of Figure rnn1 is a simplified representation of the same RNN. Each of the boxes in this figure represents a row of Neural Network nodes belonging to a single layer, of the type shown on the LHS, which have been omitted for the sake of clarity. Note that:

  • The weight matrix $U$ connects the nodes in the Input Layer with those in the Hidden Layer
  • The weight matrix $W$ connects the nodes in the Hidden Layer with the same set of nodes, but at the previous time instant. The square block on the self-loop to the Hidden Layer represents a time delay of one unit, and represents the fact that at time $n$, the nodes in that layer have as one of their inputs, the values that they had at time $n-1$
  • The weight matrix $V$ connects the nodes in the Hidden Layer with those in the Output Layer.

The fact that all these weight matrices do not change with time is a result of the time invariance assumption.

In [28]:
#rnn1c
nb_setup.images_hconcat(["DL_images/rnn1c.png"], width=600)
Out[28]:

While Figure rnn1 accurately captures the fact that the RNN design incorporates feedback, it does not lend itself easily to training of the type we have seen in Dense Feed-Forward Networks. In order to facilitate the reuse of techniques we have developed in previous chapters, we convert Figure rnn1 into an equivalent Dense Feed Forward network shown in Figure rnn1c, by a process called "unfolding". The unfolded network in the RHS of Figure rnn1c basically shows snapshots of the RNN at various points in time, and the fact that there is feedback between the nodes in the Hidden Layer is captured by the connections involving the weight matrix $W$.

As a result of the unfolding operation, the system becomes amenable to optimization using the Backprop algorithm. In addition, the depth or number of layers of the RNN is now dependent on the length of the input data sequence. This can result in a RNN with hundreds of Hidden Layers, hence the problems of training deep models with Backprop have a special resonance here. We also mentioned earlier that the Hidden Layer in RNNs keeps a "Lossy Memory" of the input sequence - the reason why it is lossy is because this single layer has to capture data from an entire input sequence, which cannot be done without compressing the data is some fashion.

In order to gain an intuitive understanding of the way in which the RNN shown in Figure rnn1c operates, note the following: There is an analogy between the way a ConvNet operates by looking for patterns in localized patches of an image, and the way a RNN operates by looking for patterns in localized intervals in time. Just as a ConvNet slides a single Filter over the entire image, a RNN slides its own filter represented by the weight matrix U over the entire input sequence, one input at a time. Hence RNNs implement a form of Translational Invariance, but they do this over the temporal axis, as opposed to spatial Translational Invariance seen in ConvNets. As a result of this property, the important part of the sequence can occur anywhere along the time axis, but still can be detected using a single weight matrix repeated at each point in time. There is however a crucial difference in the way a RNN operates compared to a ConvNet: Due to the feedback loop, the RNN pattern detector in the Hidden Layer is able to take advantage of the information that was learnt in the prior time steps from older data in the sequence. As a result the RNN is able to detect patterns that are spread over time, something that a ConvNet cannot do (in the spatial sense).

There is a fundamental difference between the internal representations that are created in the hidden layers of a ConvNet vs those that are created in the hidden layers of a RNN. The hidden layers in a ConvNet create a hierarchical representation, in which the representation at layer $r+1$ is at a higher level of abstraction compared to that for layer $r$. The hidden layer in a RNN on the other hand, does not add to the level of abstraction in the representation with successive time steps. Instead it captures patterns that are spread in time, but at the same level of abstraction. It is indeed possible to combine DLNs and RNNs and create deep RNNs (see the next section for examples), in which case higher level hidden layers capture time dependent patterns at different levels of abstraction.

The rest of this chapter is organized as follows: RNNs can be configured in several useful ways depending upon the problem they are being used to solve, some of these are described in Section Examples. In Section Training we discuss the training of RNNs and the problems that arise when doing so. In particular the Backprop algorithm can lead to issues such as the Vanishing Gradients Problem or the Exploding Gradient Problem, and in Section LSTMs we discuss a modified RNN architecture called LSTM (Long Short Term Memory) which was designed to solve these problems. In the final section we show how to model RNNs and LSTMs using Keras.

IMBD Movie Review Classification Using an RNN

In prior chapters we built models to classify IMDB Movie Reviews using Dense Feed Forward Networks (Chapter NNDeepLearning) and 1-D Convnets (Chapter ConvNetsPart1). We now show how to do this using a RNN.

We first address the problem of how to feed data into a RNN: As shown in Figure rnn39, Keras requires that data inputs into each RNN stage be in the form of a 1-D vector. Hence as Part (a) of the figure shows, a single training sample into an RNN is a 2-D matrix of shape $time\times features$, such that the $i^{th}$ row of the matrix represents the data vector that is fed into the $i^{th}$ stage of the RNN. As shown in Part (b) of the figure, a batch of samples into the RNN becomes a 3D tensor of shape $sample\times time\times features$.

All datasets to be fed into an RNN have to be first formatted into this shape. This raises the question of how to feed non-vector data into a RNN, the most common example of which are 3-D images. A common way of doing so is by first passing the image data through a ConvNet, and then using the image feature vector that occurs at the end of the convolutional layers as an input. This enables us to feed a video clip into a RNN, such that images in the clip are fed into successive stages of the RNN.

In [11]:
#rnn39
nb_setup.images_hconcat(["DL_images/rnn39.png"], width=600)
Out[11]:

We use the IMDB dataset that comes with Keras. The test data has already been split into trainin + test samples and also tokenized such that all words have been converted into integers. When loading this data, we limit ourselves to the top 10,000 words that occur in the dataset by means of the max_features parameter. Furthermore each review is truncated after 500 words, with shorter reviews padded with zeroes, using the pad_sequences command and the maxlen parameter. At the end of these operations, each review is in the form of a 1-D vector of size 500 and the entire IMDB dataset is a matrix of shape $samples\times word tokens$.

In [2]:
from keras.datasets import imdb
from keras.preprocessing import sequence

max_features = 10000  # number of words to consider as features
maxlen = 500  # cut texts after this number of words (among top max_features most common words)
batch_size = 32

print('Loading data...')
(input_train, y_train), (input_test, y_test) = imdb.load_data(num_words=max_features)
print(len(input_train), 'train sequences')
print(len(input_test), 'test sequences')

print('Pad sequences (samples x time)')
input_train = sequence.pad_sequences(input_train, maxlen=maxlen)
input_test = sequence.pad_sequences(input_test, maxlen=maxlen)
print('input_train shape:', input_train.shape)
print('input_test shape:', input_test.shape)
Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
input_train shape: (25000, 500)
input_test shape: (25000, 500)

The RNN model for the IMDB problem is shown below. It is a basic RNN with 500 stages (since each movie review sample consists of 500 words) and a single Logit node for doing the binary classification.

  • In order to input a single word into the model, Keras does the following: A word is represented using 1-Hot vector of size 10,000 (corresponding to the 10,000 words in the vocabulary), and it then passes through the Embedding Layer which converts it into its vector representation of size 32 by multiplying it with a matrix of size $10,000\times 32$ (note that this multiplication is not needed in the actual implementation, Keras just picks the $i^{th}$ row of the matrix if the token of the input word is $i$). Note that we are not using a pre-trained Embedding Layer, hence the best embedding for each word is learnt as part of the training process.

  • Keras inputs a single review into the model by presenting each of its 500 words in turn to successive stages of the RNN.

  • Keras inputs a batch of reviews into the model by creating a 3-D tensor of shape $(batch\ size, review\ size, feature\ size)$, which is (128, 500, 32) for our example. Hence during the forward pass, 128 reviews are fed in parallel into the model.

In [5]:
#rnn40
nb_setup.images_hconcat(["DL_images/rnn40.png"], width=600)
Out[5]:
In [5]:
from keras.layers import Dense
from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN

model = Sequential()
model.add(Embedding(max_features, 32))
model.add(SimpleRNN(32))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
In [6]:
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, 32)                2080      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 33        
=================================================================
Total params: 322,113
Trainable params: 322,113
Non-trainable params: 0
_________________________________________________________________
In [7]:
history = model.fit(input_train, y_train,
                    epochs=100,
                    batch_size=128,
                    validation_split=0.2)
Epoch 1/100
157/157 [==============================] - 25s 148ms/step - loss: 0.5662 - acc: 0.7080 - val_loss: 0.4769 - val_acc: 0.7876
Epoch 2/100
157/157 [==============================] - 22s 138ms/step - loss: 0.3628 - acc: 0.8516 - val_loss: 0.4489 - val_acc: 0.8184
Epoch 3/100
157/157 [==============================] - 23s 145ms/step - loss: 0.2811 - acc: 0.8903 - val_loss: 0.3696 - val_acc: 0.8548
Epoch 4/100
157/157 [==============================] - 23s 144ms/step - loss: 0.2375 - acc: 0.9098 - val_loss: 0.3564 - val_acc: 0.8430
Epoch 5/100
157/157 [==============================] - 22s 137ms/step - loss: 0.2132 - acc: 0.9233 - val_loss: 0.3872 - val_acc: 0.8416
Epoch 6/100
157/157 [==============================] - 21s 134ms/step - loss: 0.1650 - acc: 0.9410 - val_loss: 0.3683 - val_acc: 0.8580
Epoch 7/100
157/157 [==============================] - 21s 133ms/step - loss: 0.1352 - acc: 0.9521 - val_loss: 0.5580 - val_acc: 0.8394
Epoch 8/100
157/157 [==============================] - 21s 134ms/step - loss: 0.1128 - acc: 0.9616 - val_loss: 0.4379 - val_acc: 0.8416
Epoch 9/100
157/157 [==============================] - 21s 135ms/step - loss: 0.0918 - acc: 0.9688 - val_loss: 0.4639 - val_acc: 0.8524
Epoch 10/100
157/157 [==============================] - 21s 133ms/step - loss: 0.0769 - acc: 0.9739 - val_loss: 0.4772 - val_acc: 0.8678
Epoch 11/100
157/157 [==============================] - 21s 134ms/step - loss: 0.0685 - acc: 0.9793 - val_loss: 0.7638 - val_acc: 0.8288
Epoch 12/100
157/157 [==============================] - 21s 135ms/step - loss: 0.0456 - acc: 0.9856 - val_loss: 0.6962 - val_acc: 0.7726
Epoch 13/100
157/157 [==============================] - 22s 137ms/step - loss: 0.0356 - acc: 0.9892 - val_loss: 0.5905 - val_acc: 0.8368
Epoch 14/100
157/157 [==============================] - 21s 135ms/step - loss: 0.0553 - acc: 0.9833 - val_loss: 0.7072 - val_acc: 0.8006
Epoch 15/100
157/157 [==============================] - 21s 136ms/step - loss: 0.0235 - acc: 0.9926 - val_loss: 0.6466 - val_acc: 0.8414
Epoch 16/100
157/157 [==============================] - 21s 135ms/step - loss: 0.0333 - acc: 0.9880 - val_loss: 0.7729 - val_acc: 0.7922
Epoch 17/100
157/157 [==============================] - 21s 136ms/step - loss: 0.0241 - acc: 0.9931 - val_loss: 0.7891 - val_acc: 0.8072
Epoch 18/100
157/157 [==============================] - 21s 135ms/step - loss: 0.0177 - acc: 0.9944 - val_loss: 0.7494 - val_acc: 0.8270
Epoch 19/100
157/157 [==============================] - 21s 136ms/step - loss: 0.0094 - acc: 0.9970 - val_loss: 0.8824 - val_acc: 0.8028
Epoch 20/100
157/157 [==============================] - 21s 136ms/step - loss: 0.0048 - acc: 0.9985 - val_loss: 0.8467 - val_acc: 0.8282
Epoch 21/100
157/157 [==============================] - 21s 134ms/step - loss: 0.0100 - acc: 0.9968 - val_loss: 1.0301 - val_acc: 0.7628
Epoch 22/100
157/157 [==============================] - 21s 136ms/step - loss: 0.0051 - acc: 0.9987 - val_loss: 0.9628 - val_acc: 0.7922
Epoch 23/100
157/157 [==============================] - 21s 135ms/step - loss: 0.0051 - acc: 0.9984 - val_loss: 0.9925 - val_acc: 0.8026
Epoch 24/100
157/157 [==============================] - 21s 135ms/step - loss: 0.0047 - acc: 0.9985 - val_loss: 1.0366 - val_acc: 0.7850
Epoch 25/100
157/157 [==============================] - 21s 135ms/step - loss: 0.0038 - acc: 0.9987 - val_loss: 0.9723 - val_acc: 0.8030
Epoch 26/100
157/157 [==============================] - 21s 134ms/step - loss: 0.0043 - acc: 0.9987 - val_loss: 1.0013 - val_acc: 0.7978
Epoch 27/100
157/157 [==============================] - 21s 136ms/step - loss: 0.0076 - acc: 0.9980 - val_loss: 1.2578 - val_acc: 0.7592
Epoch 28/100
157/157 [==============================] - 21s 136ms/step - loss: 0.0034 - acc: 0.9987 - val_loss: 1.3130 - val_acc: 0.7430
Epoch 29/100
157/157 [==============================] - 21s 135ms/step - loss: 0.0045 - acc: 0.9985 - val_loss: 1.2486 - val_acc: 0.7576
Epoch 30/100
157/157 [==============================] - 21s 135ms/step - loss: 0.0038 - acc: 0.9987 - val_loss: 1.1152 - val_acc: 0.7880
Epoch 31/100
157/157 [==============================] - 21s 135ms/step - loss: 0.0036 - acc: 0.9988 - val_loss: 1.1975 - val_acc: 0.7756
Epoch 32/100
157/157 [==============================] - 21s 134ms/step - loss: 0.0044 - acc: 0.9987 - val_loss: 1.2073 - val_acc: 0.7676
Epoch 33/100
157/157 [==============================] - 21s 136ms/step - loss: 0.0033 - acc: 0.9988 - val_loss: 1.1394 - val_acc: 0.7836
Epoch 34/100
157/157 [==============================] - 21s 134ms/step - loss: 0.0011 - acc: 0.9997 - val_loss: 1.1878 - val_acc: 0.7848
Epoch 35/100
157/157 [==============================] - 22s 138ms/step - loss: 0.0044 - acc: 0.9989 - val_loss: 1.2360 - val_acc: 0.7704
Epoch 36/100
157/157 [==============================] - 22s 142ms/step - loss: 0.0011 - acc: 0.9997 - val_loss: 1.2728 - val_acc: 0.7744
Epoch 37/100
157/157 [==============================] - 21s 137ms/step - loss: 0.0027 - acc: 0.9991 - val_loss: 1.3752 - val_acc: 0.7592
Epoch 38/100
157/157 [==============================] - 21s 134ms/step - loss: 0.0041 - acc: 0.9983 - val_loss: 1.2901 - val_acc: 0.7710
Epoch 39/100
157/157 [==============================] - 21s 134ms/step - loss: 0.0045 - acc: 0.9986 - val_loss: 1.3399 - val_acc: 0.7536
Epoch 40/100
157/157 [==============================] - 21s 135ms/step - loss: 0.0044 - acc: 0.9987 - val_loss: 1.3645 - val_acc: 0.7626
Epoch 41/100
157/157 [==============================] - 23s 146ms/step - loss: 0.0023 - acc: 0.9992 - val_loss: 1.3305 - val_acc: 0.7664
Epoch 42/100
157/157 [==============================] - 22s 143ms/step - loss: 0.0015 - acc: 0.9995 - val_loss: 1.3996 - val_acc: 0.7552
Epoch 43/100
157/157 [==============================] - 21s 133ms/step - loss: 4.4946e-04 - acc: 0.9999 - val_loss: 1.4366 - val_acc: 0.7574
Epoch 44/100
157/157 [==============================] - 21s 133ms/step - loss: 2.6842e-04 - acc: 0.9999 - val_loss: 1.7124 - val_acc: 0.7120
Epoch 45/100
157/157 [==============================] - 21s 134ms/step - loss: 0.0017 - acc: 0.9994 - val_loss: 1.5959 - val_acc: 0.7292
Epoch 46/100
157/157 [==============================] - 22s 141ms/step - loss: 8.9925e-04 - acc: 0.9997 - val_loss: 1.6372 - val_acc: 0.7218
Epoch 47/100
157/157 [==============================] - 21s 136ms/step - loss: 0.0030 - acc: 0.9990 - val_loss: 1.6065 - val_acc: 0.7232
Epoch 48/100
157/157 [==============================] - 21s 133ms/step - loss: 7.6052e-04 - acc: 0.9997 - val_loss: 1.5389 - val_acc: 0.7380
Epoch 49/100
157/157 [==============================] - 21s 135ms/step - loss: 0.0011 - acc: 0.9997 - val_loss: 1.5803 - val_acc: 0.7468
Epoch 50/100
157/157 [==============================] - 21s 133ms/step - loss: 4.9535e-04 - acc: 0.9998 - val_loss: 1.5754 - val_acc: 0.7482
Epoch 51/100
157/157 [==============================] - 21s 133ms/step - loss: 0.0011 - acc: 0.9997 - val_loss: 1.5845 - val_acc: 0.7524
Epoch 52/100
157/157 [==============================] - 21s 133ms/step - loss: 0.0044 - acc: 0.9985 - val_loss: 1.7093 - val_acc: 0.7384
Epoch 53/100
157/157 [==============================] - 21s 135ms/step - loss: 0.0020 - acc: 0.9994 - val_loss: 1.6615 - val_acc: 0.7574
Epoch 54/100
157/157 [==============================] - 21s 134ms/step - loss: 0.0050 - acc: 0.9987 - val_loss: 1.7627 - val_acc: 0.7318
Epoch 55/100
157/157 [==============================] - 21s 134ms/step - loss: 5.6182e-04 - acc: 0.9998 - val_loss: 1.9114 - val_acc: 0.7176
Epoch 56/100
157/157 [==============================] - 21s 133ms/step - loss: 0.0020 - acc: 0.9992 - val_loss: 1.6934 - val_acc: 0.7424
Epoch 57/100
157/157 [==============================] - 21s 134ms/step - loss: 0.0019 - acc: 0.9995 - val_loss: 1.5983 - val_acc: 0.7500
Epoch 58/100
157/157 [==============================] - 21s 134ms/step - loss: 6.3050e-04 - acc: 0.9998 - val_loss: 1.6219 - val_acc: 0.7634
Epoch 59/100
157/157 [==============================] - 21s 133ms/step - loss: 2.1885e-04 - acc: 0.9999 - val_loss: 1.6265 - val_acc: 0.7604
Epoch 60/100
157/157 [==============================] - 21s 134ms/step - loss: 2.7765e-04 - acc: 0.9998 - val_loss: 1.7236 - val_acc: 0.7496
Epoch 61/100
157/157 [==============================] - 21s 134ms/step - loss: 5.1104e-04 - acc: 0.9998 - val_loss: 1.7307 - val_acc: 0.7554
Epoch 62/100
157/157 [==============================] - 21s 133ms/step - loss: 0.0014 - acc: 0.9995 - val_loss: 1.8284 - val_acc: 0.7284
Epoch 63/100
157/157 [==============================] - 21s 133ms/step - loss: 8.9904e-04 - acc: 0.9998 - val_loss: 1.7856 - val_acc: 0.7300
Epoch 64/100
157/157 [==============================] - 21s 132ms/step - loss: 3.0090e-04 - acc: 0.9999 - val_loss: 2.0609 - val_acc: 0.7168
Epoch 65/100
157/157 [==============================] - 21s 133ms/step - loss: 1.4129e-04 - acc: 0.9999 - val_loss: 1.8861 - val_acc: 0.7394
Epoch 66/100
157/157 [==============================] - 21s 133ms/step - loss: 0.0014 - acc: 0.9995 - val_loss: 1.9126 - val_acc: 0.7262
Epoch 67/100
157/157 [==============================] - 21s 133ms/step - loss: 0.0022 - acc: 0.9991 - val_loss: 2.1492 - val_acc: 0.7010
Epoch 68/100
157/157 [==============================] - 21s 133ms/step - loss: 1.5285e-04 - acc: 0.9999 - val_loss: 1.9580 - val_acc: 0.7378
Epoch 69/100
157/157 [==============================] - 21s 134ms/step - loss: 2.0725e-04 - acc: 0.9999 - val_loss: 2.3079 - val_acc: 0.6888
Epoch 70/100
157/157 [==============================] - 21s 134ms/step - loss: 3.2451e-05 - acc: 1.0000 - val_loss: 1.9635 - val_acc: 0.7458
Epoch 71/100
157/157 [==============================] - 21s 134ms/step - loss: 5.0015e-04 - acc: 0.9998 - val_loss: 1.9711 - val_acc: 0.7418
Epoch 72/100
157/157 [==============================] - 21s 132ms/step - loss: 0.0016 - acc: 0.9996 - val_loss: 1.9615 - val_acc: 0.7466
Epoch 73/100
157/157 [==============================] - 21s 133ms/step - loss: 6.9097e-04 - acc: 0.9998 - val_loss: 2.0379 - val_acc: 0.7404
Epoch 74/100
157/157 [==============================] - 21s 135ms/step - loss: 2.6221e-05 - acc: 1.0000 - val_loss: 2.0917 - val_acc: 0.7376
Epoch 75/100
157/157 [==============================] - 21s 134ms/step - loss: 3.0255e-05 - acc: 1.0000 - val_loss: 1.9486 - val_acc: 0.7688
Epoch 76/100
157/157 [==============================] - 21s 134ms/step - loss: 0.0020 - acc: 0.9995 - val_loss: 2.0380 - val_acc: 0.7498
Epoch 77/100
157/157 [==============================] - 21s 132ms/step - loss: 0.0014 - acc: 0.9998 - val_loss: 2.1089 - val_acc: 0.7302
Epoch 78/100
157/157 [==============================] - 21s 133ms/step - loss: 1.8078e-04 - acc: 0.9999 - val_loss: 2.1894 - val_acc: 0.7212
Epoch 79/100
157/157 [==============================] - 21s 134ms/step - loss: 0.0016 - acc: 0.9995 - val_loss: 2.1526 - val_acc: 0.7210
Epoch 80/100
157/157 [==============================] - 21s 133ms/step - loss: 2.3403e-04 - acc: 0.9999 - val_loss: 2.2912 - val_acc: 0.7216
Epoch 81/100
157/157 [==============================] - 21s 134ms/step - loss: 8.6144e-05 - acc: 0.9999 - val_loss: 2.1191 - val_acc: 0.7364
Epoch 82/100
157/157 [==============================] - 21s 133ms/step - loss: 0.0013 - acc: 0.9997 - val_loss: 2.3310 - val_acc: 0.7094
Epoch 83/100
157/157 [==============================] - 21s 132ms/step - loss: 2.8061e-04 - acc: 0.9999 - val_loss: 2.3067 - val_acc: 0.7098
Epoch 84/100
157/157 [==============================] - 21s 132ms/step - loss: 4.5224e-04 - acc: 0.9998 - val_loss: 2.1959 - val_acc: 0.7298
Epoch 85/100
157/157 [==============================] - 21s 133ms/step - loss: 2.7094e-04 - acc: 0.9999 - val_loss: 2.2141 - val_acc: 0.7284
Epoch 86/100
157/157 [==============================] - 21s 132ms/step - loss: 6.9691e-04 - acc: 0.9998 - val_loss: 2.3401 - val_acc: 0.7196
Epoch 87/100
157/157 [==============================] - 21s 134ms/step - loss: 4.9009e-04 - acc: 0.9999 - val_loss: 2.1792 - val_acc: 0.7418
Epoch 88/100
157/157 [==============================] - 21s 134ms/step - loss: 5.4425e-04 - acc: 0.9999 - val_loss: 2.3057 - val_acc: 0.7160
Epoch 89/100
157/157 [==============================] - 21s 134ms/step - loss: 3.4485e-04 - acc: 0.9999 - val_loss: 2.2222 - val_acc: 0.7270
Epoch 90/100
157/157 [==============================] - 21s 133ms/step - loss: 1.7077e-04 - acc: 0.9999 - val_loss: 2.1758 - val_acc: 0.7398
Epoch 91/100
157/157 [==============================] - 21s 133ms/step - loss: 8.2131e-06 - acc: 1.0000 - val_loss: 2.3120 - val_acc: 0.7312
Epoch 92/100
157/157 [==============================] - 21s 133ms/step - loss: 0.0033 - acc: 0.9991 - val_loss: 2.2393 - val_acc: 0.7284
Epoch 93/100
157/157 [==============================] - 21s 136ms/step - loss: 0.0020 - acc: 0.9995 - val_loss: 2.2916 - val_acc: 0.7278
Epoch 94/100
157/157 [==============================] - 22s 140ms/step - loss: 2.8387e-04 - acc: 0.9999 - val_loss: 2.1180 - val_acc: 0.7456
Epoch 95/100
157/157 [==============================] - 22s 140ms/step - loss: 0.0017 - acc: 0.9995 - val_loss: 2.3299 - val_acc: 0.7148
Epoch 96/100
157/157 [==============================] - 22s 138ms/step - loss: 5.5296e-05 - acc: 1.0000 - val_loss: 2.4119 - val_acc: 0.7056
Epoch 97/100
157/157 [==============================] - 22s 140ms/step - loss: 0.0024 - acc: 0.9994 - val_loss: 2.2842 - val_acc: 0.7350
Epoch 98/100
157/157 [==============================] - 22s 139ms/step - loss: 8.3363e-04 - acc: 0.9997 - val_loss: 2.2574 - val_acc: 0.7376
Epoch 99/100
157/157 [==============================] - 22s 142ms/step - loss: 5.4921e-04 - acc: 0.9998 - val_loss: 2.3752 - val_acc: 0.7160
Epoch 100/100
157/157 [==============================] - 22s 139ms/step - loss: 1.7456e-04 - acc: 0.9999 - val_loss: 2.3985 - val_acc: 0.7356
In [9]:
import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

The Accuracy and Loss curves show that the model starts to overfit after about 8-10 epochs, achieving a maximum accuracy of about 85%. This is also close to the accuracy that was achieved with the 1-D convnet (see Chapter ConvNetsPart1), even though that was done using an embedding of size 128. In order to increase the accuracy we may increase the the number of words per review to more than 500. However this also increases the number of RNN stages, and later in this chapter we will show that this can cause problems with the Backprop algorithm.

Examples of RNN Architectures

RNNs with Stacked Layers

In [12]:
#rnn38
nb_setup.images_hconcat(["DL_images/rnn48.png"], width=600)
Out[12]:

Figure rnn38 shows a RNN with multiple hidden layers, usually referred to as a Deep RNN. This system incorporates ideas from Dense Feed Forward Networks into the RNN, and enables the system to simultaneously create:

  1. Higher level hierarchical representations of the input data, and
  2. At each level of the hierarchy capture temporal patterns in the data. This architecture is used quite commonly in current RNN systems.

The Keras code for a RNN Model with 4 stacked layers is shown below. They require the use of the return_sequences flag which is set to True in the intermediate layers as shown. This causes Keras to return the full sequence of outputs for each timestep, which are needed to feed the following layer. The model.summary() command shows the output shape from the intermediate layers as a 3D tensor of size $(batch\ size, timesteps, output\ features)$.

In [10]:
model = Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32, return_sequences=True))
model.add(SimpleRNN(32, return_sequences=True))
model.add(SimpleRNN(32, return_sequences=True))
model.add(SimpleRNN(32))  # This last layer only returns the last outputs.
model.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_2 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
simple_rnn_2 (SimpleRNN)     (None, None, 32)          2080      
_________________________________________________________________
simple_rnn_3 (SimpleRNN)     (None, None, 32)          2080      
_________________________________________________________________
simple_rnn_4 (SimpleRNN)     (None, None, 32)          2080      
_________________________________________________________________
simple_rnn_5 (SimpleRNN)     (None, 32)                2080      
=================================================================
Total params: 328,320
Trainable params: 328,320
Non-trainable params: 0
_________________________________________________________________

Bi-Directional RNNs

In [10]:
#rnn38
nb_setup.images_hconcat(["DL_images/rnn38.png"], width=600)
Out[10]:

Figure rnn38 shows an important RNN architecture called Bi-Directional RNNS. As the name implies this system incorporates two sets of hidden layers. The first set is the conventional hidden layer, with connections going from left to right, while the second set of Hidden Layers has connections going in the opposite direction, from right to left. Unlike the first layer, the second layer is able to spot patterns in the reversed sequence. The final output of each layer incorporates information from both the forward and backward hidden layers, usually by concatenation of layers. Such a design can be used for the case when the input sequence is not being generated in real time and is most useful when processing text sequences and has been shown to substantially improve performance in many cases. Note that this architecture cannot be used for real time data sequences, since the at test time future values are not yet available.

The code snippet below shows how to program a Bi-Directional RNN in Keras. The Bi-Directional layer is invoked using a recurrent layer instance as its first argument, which processes the input sequence in the forward order. It creates a second layer which processes the input sequence in the reverse order, and then the final states of the two layers are concatenated and fed into the Dense layer.

In [11]:
from keras import layers
from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN

model = Sequential()
model.add(layers.Embedding(10000, 32))
model.add(layers.Bidirectional(layers.SimpleRNN(32)))
model.add(layers.Dense(1,activation='sigmoid'))
model.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_3 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
bidirectional (Bidirectional (None, 64)                4160      
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 65        
=================================================================
Total params: 324,225
Trainable params: 324,225
Non-trainable params: 0
_________________________________________________________________

RNNs with Dropout

Recall that the Dropout technique is used to reduce the amount of Overfitting in a DLN model. Initial applications of Dropout to RNNs were not successful, and later it was discovered that it was due to the fact that a different random Dropout Mask was applied to successive stages of the RNN. Later was shown that if the Dropout Mask was kept fixed for each stage for a single forward/backward pass through the RNN, then indeed the amount of Overfitting was reduced. Keras supports this type of RNN Dropout, and it can be turned on by using two flags in the parameter list (see below): The dropout flag specifies that Dropout should be used in the input layer of the model, while the recurrent_dropout flag specifies that Dropout should be used in the recurrent layer as well.

Earlier in this chapter we saw that the SimpleRNN model when applied to the IMDB Movie Review problem resulted in a good deal of overfitting. In the example below we use Dropout as part of the model. The results show that indeed the Overfitting is pushed out to later in the training process, after 60-70 epochs as opposed to 10 epochs. However there is not much improvement in the best accuracy level, which actually has decreased.

In [12]:
from keras.layers import Dense
from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN

model = Sequential()
model.add(Embedding(max_features, 32))
model.add(SimpleRNN(32,
                    dropout = 0.2,
                    recurrent_dropout = 0.2))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
In [13]:
history = model.fit(input_train, y_train,
                    epochs=100,
                    batch_size=128,
                    validation_split=0.2)
Epoch 1/100
157/157 [==============================] - 33s 203ms/step - loss: 0.7074 - acc: 0.5074 - val_loss: 0.6956 - val_acc: 0.5116
Epoch 2/100
157/157 [==============================] - 30s 193ms/step - loss: 0.6973 - acc: 0.5220 - val_loss: 0.6915 - val_acc: 0.5200
Epoch 3/100
157/157 [==============================] - 30s 193ms/step - loss: 0.6871 - acc: 0.5433 - val_loss: 0.6746 - val_acc: 0.5798
Epoch 4/100
157/157 [==============================] - 30s 191ms/step - loss: 0.6517 - acc: 0.6086 - val_loss: 0.7779 - val_acc: 0.5286
Epoch 5/100
157/157 [==============================] - 31s 195ms/step - loss: 0.6262 - acc: 0.6533 - val_loss: 0.6730 - val_acc: 0.5766
Epoch 6/100
157/157 [==============================] - 29s 186ms/step - loss: 0.4985 - acc: 0.7706 - val_loss: 0.5821 - val_acc: 0.7280
Epoch 7/100
157/157 [==============================] - 30s 193ms/step - loss: 0.4435 - acc: 0.8035 - val_loss: 0.5583 - val_acc: 0.7590
Epoch 8/100
157/157 [==============================] - 30s 192ms/step - loss: 0.4031 - acc: 0.8274 - val_loss: 0.7027 - val_acc: 0.7140
Epoch 9/100
157/157 [==============================] - 30s 193ms/step - loss: 0.3834 - acc: 0.8396 - val_loss: 0.7025 - val_acc: 0.7368
Epoch 10/100
157/157 [==============================] - 30s 193ms/step - loss: 0.3664 - acc: 0.8480 - val_loss: 0.4992 - val_acc: 0.8132
Epoch 11/100
157/157 [==============================] - 30s 193ms/step - loss: 0.3594 - acc: 0.8521 - val_loss: 0.7798 - val_acc: 0.7376
Epoch 12/100
157/157 [==============================] - 30s 193ms/step - loss: 0.3422 - acc: 0.8590 - val_loss: 0.5143 - val_acc: 0.8006
Epoch 13/100
157/157 [==============================] - 30s 192ms/step - loss: 0.3369 - acc: 0.8638 - val_loss: 0.4956 - val_acc: 0.8234
Epoch 14/100
157/157 [==============================] - 30s 193ms/step - loss: 0.3243 - acc: 0.8701 - val_loss: 0.4744 - val_acc: 0.8302
Epoch 15/100
157/157 [==============================] - 30s 193ms/step - loss: 0.3119 - acc: 0.8742 - val_loss: 0.4310 - val_acc: 0.8384
Epoch 16/100
157/157 [==============================] - 30s 194ms/step - loss: 0.2984 - acc: 0.8805 - val_loss: 0.4673 - val_acc: 0.8358
Epoch 17/100
157/157 [==============================] - 30s 193ms/step - loss: 0.2950 - acc: 0.8852 - val_loss: 0.8859 - val_acc: 0.7176
Epoch 18/100
157/157 [==============================] - 30s 194ms/step - loss: 0.3119 - acc: 0.8773 - val_loss: 0.4268 - val_acc: 0.8514
Epoch 19/100
157/157 [==============================] - 30s 193ms/step - loss: 0.2854 - acc: 0.8892 - val_loss: 0.4994 - val_acc: 0.8204
Epoch 20/100
157/157 [==============================] - 30s 193ms/step - loss: 0.2718 - acc: 0.8923 - val_loss: 0.5074 - val_acc: 0.8368
Epoch 21/100
157/157 [==============================] - 30s 193ms/step - loss: 0.2833 - acc: 0.8897 - val_loss: 0.6319 - val_acc: 0.7928
Epoch 22/100
157/157 [==============================] - 30s 192ms/step - loss: 0.2593 - acc: 0.9006 - val_loss: 0.4338 - val_acc: 0.8484
Epoch 23/100
157/157 [==============================] - 30s 194ms/step - loss: 0.2516 - acc: 0.9040 - val_loss: 0.5642 - val_acc: 0.8180
Epoch 24/100
157/157 [==============================] - 30s 194ms/step - loss: 0.2483 - acc: 0.9054 - val_loss: 0.4587 - val_acc: 0.8364
Epoch 25/100
157/157 [==============================] - 30s 193ms/step - loss: 0.2425 - acc: 0.9063 - val_loss: 0.6253 - val_acc: 0.8018
Epoch 26/100
157/157 [==============================] - 30s 193ms/step - loss: 0.2456 - acc: 0.9065 - val_loss: 0.5691 - val_acc: 0.8166
Epoch 27/100
157/157 [==============================] - 30s 193ms/step - loss: 0.2398 - acc: 0.9096 - val_loss: 0.4968 - val_acc: 0.8300
Epoch 28/100
157/157 [==============================] - 31s 195ms/step - loss: 0.2301 - acc: 0.9148 - val_loss: 0.5075 - val_acc: 0.8254
Epoch 29/100
157/157 [==============================] - 30s 194ms/step - loss: 0.2332 - acc: 0.9132 - val_loss: 0.5634 - val_acc: 0.8210
Epoch 30/100
157/157 [==============================] - 30s 190ms/step - loss: 0.2217 - acc: 0.9168 - val_loss: 0.5028 - val_acc: 0.8330
Epoch 31/100
157/157 [==============================] - 30s 193ms/step - loss: 0.2169 - acc: 0.9179 - val_loss: 0.5718 - val_acc: 0.8172
Epoch 32/100
157/157 [==============================] - 30s 193ms/step - loss: 0.2102 - acc: 0.9221 - val_loss: 0.4794 - val_acc: 0.8376
Epoch 33/100
157/157 [==============================] - 30s 192ms/step - loss: 0.2162 - acc: 0.9191 - val_loss: 0.4709 - val_acc: 0.8384
Epoch 34/100
157/157 [==============================] - 30s 194ms/step - loss: 0.2043 - acc: 0.9236 - val_loss: 0.5629 - val_acc: 0.8290
Epoch 35/100
157/157 [==============================] - 31s 195ms/step - loss: 0.2018 - acc: 0.9241 - val_loss: 0.5262 - val_acc: 0.8332
Epoch 36/100
157/157 [==============================] - 31s 201ms/step - loss: 0.1999 - acc: 0.9260 - val_loss: 0.5514 - val_acc: 0.8326
Epoch 37/100
157/157 [==============================] - 32s 204ms/step - loss: 0.1938 - acc: 0.9274 - val_loss: 0.5081 - val_acc: 0.8408
Epoch 38/100
157/157 [==============================] - 32s 201ms/step - loss: 0.2018 - acc: 0.9264 - val_loss: 0.5229 - val_acc: 0.8324
Epoch 39/100
157/157 [==============================] - 31s 200ms/step - loss: 0.1986 - acc: 0.9254 - val_loss: 0.5047 - val_acc: 0.8384
Epoch 40/100
157/157 [==============================] - 32s 202ms/step - loss: 0.1858 - acc: 0.9303 - val_loss: 0.5093 - val_acc: 0.8388
Epoch 41/100
157/157 [==============================] - 31s 195ms/step - loss: 0.1858 - acc: 0.9310 - val_loss: 0.6178 - val_acc: 0.8232
Epoch 42/100
157/157 [==============================] - 30s 191ms/step - loss: 0.1769 - acc: 0.9343 - val_loss: 0.5326 - val_acc: 0.8266
Epoch 43/100
157/157 [==============================] - 31s 200ms/step - loss: 0.1723 - acc: 0.9365 - val_loss: 0.5706 - val_acc: 0.8244
Epoch 44/100
157/157 [==============================] - 30s 193ms/step - loss: 0.1699 - acc: 0.9380 - val_loss: 0.5882 - val_acc: 0.8222
Epoch 45/100
157/157 [==============================] - 30s 193ms/step - loss: 0.1719 - acc: 0.9379 - val_loss: 0.6628 - val_acc: 0.8142
Epoch 46/100
157/157 [==============================] - 30s 192ms/step - loss: 0.1755 - acc: 0.9366 - val_loss: 0.6323 - val_acc: 0.8142
Epoch 47/100
157/157 [==============================] - 30s 193ms/step - loss: 0.1712 - acc: 0.9363 - val_loss: 0.6126 - val_acc: 0.8226
Epoch 48/100
157/157 [==============================] - 30s 191ms/step - loss: 0.1695 - acc: 0.9359 - val_loss: 0.6578 - val_acc: 0.8236
Epoch 49/100
157/157 [==============================] - 31s 196ms/step - loss: 0.1623 - acc: 0.9420 - val_loss: 0.7442 - val_acc: 0.8004
Epoch 50/100
157/157 [==============================] - 34s 218ms/step - loss: 0.1626 - acc: 0.9394 - val_loss: 0.6454 - val_acc: 0.8156
Epoch 51/100
157/157 [==============================] - 31s 200ms/step - loss: 0.1605 - acc: 0.9421 - val_loss: 0.5724 - val_acc: 0.8154
Epoch 52/100
157/157 [==============================] - 31s 195ms/step - loss: 0.1582 - acc: 0.9438 - val_loss: 0.5717 - val_acc: 0.7942
Epoch 53/100
157/157 [==============================] - 30s 192ms/step - loss: 0.1585 - acc: 0.9425 - val_loss: 0.7980 - val_acc: 0.7984
Epoch 54/100
157/157 [==============================] - 30s 192ms/step - loss: 0.1533 - acc: 0.9459 - val_loss: 0.6275 - val_acc: 0.8184
Epoch 55/100
157/157 [==============================] - 30s 193ms/step - loss: 0.1447 - acc: 0.9485 - val_loss: 0.7149 - val_acc: 0.8106
Epoch 56/100
157/157 [==============================] - 31s 197ms/step - loss: 0.1433 - acc: 0.9479 - val_loss: 0.6634 - val_acc: 0.8164
Epoch 57/100
157/157 [==============================] - 30s 192ms/step - loss: 0.1417 - acc: 0.9485 - val_loss: 0.6454 - val_acc: 0.8150
Epoch 58/100
157/157 [==============================] - 30s 192ms/step - loss: 0.1458 - acc: 0.9477 - val_loss: 0.6386 - val_acc: 0.8078
Epoch 59/100
157/157 [==============================] - 30s 193ms/step - loss: 0.1371 - acc: 0.9517 - val_loss: 0.6825 - val_acc: 0.8112
Epoch 60/100
157/157 [==============================] - 30s 194ms/step - loss: 0.1419 - acc: 0.9505 - val_loss: 0.5936 - val_acc: 0.8124
Epoch 61/100
157/157 [==============================] - 30s 192ms/step - loss: 0.1361 - acc: 0.9520 - val_loss: 0.6631 - val_acc: 0.7912
Epoch 62/100
157/157 [==============================] - 30s 192ms/step - loss: 0.1379 - acc: 0.9520 - val_loss: 0.6753 - val_acc: 0.8132
Epoch 63/100
157/157 [==============================] - 30s 193ms/step - loss: 0.1285 - acc: 0.9560 - val_loss: 0.7988 - val_acc: 0.8014
Epoch 64/100
157/157 [==============================] - 30s 192ms/step - loss: 0.1378 - acc: 0.9508 - val_loss: 0.6987 - val_acc: 0.8170
Epoch 65/100
157/157 [==============================] - 30s 192ms/step - loss: 0.1315 - acc: 0.9538 - val_loss: 0.6677 - val_acc: 0.8060
Epoch 66/100
157/157 [==============================] - 30s 194ms/step - loss: 0.1292 - acc: 0.9556 - val_loss: 0.9279 - val_acc: 0.7758
Epoch 67/100
157/157 [==============================] - 31s 196ms/step - loss: 0.1330 - acc: 0.9524 - val_loss: 0.6766 - val_acc: 0.8108
Epoch 68/100
157/157 [==============================] - 31s 195ms/step - loss: 0.1286 - acc: 0.9556 - val_loss: 0.6717 - val_acc: 0.8082
Epoch 69/100
157/157 [==============================] - 31s 195ms/step - loss: 0.1293 - acc: 0.9538 - val_loss: 0.6495 - val_acc: 0.8076
Epoch 70/100
157/157 [==============================] - 31s 196ms/step - loss: 0.1257 - acc: 0.9564 - val_loss: 0.6978 - val_acc: 0.8056
Epoch 71/100
157/157 [==============================] - 34s 218ms/step - loss: 0.1236 - acc: 0.9572 - val_loss: 0.6664 - val_acc: 0.8084
Epoch 72/100
157/157 [==============================] - 32s 203ms/step - loss: 0.1264 - acc: 0.9574 - val_loss: 0.7215 - val_acc: 0.8116
Epoch 73/100
157/157 [==============================] - 32s 203ms/step - loss: 0.1161 - acc: 0.9595 - val_loss: 0.6979 - val_acc: 0.8086
Epoch 74/100
157/157 [==============================] - 31s 196ms/step - loss: 0.1171 - acc: 0.9589 - val_loss: 0.6830 - val_acc: 0.8080
Epoch 75/100
157/157 [==============================] - 32s 203ms/step - loss: 0.1219 - acc: 0.9566 - val_loss: 0.6877 - val_acc: 0.8152
Epoch 76/100
157/157 [==============================] - 33s 208ms/step - loss: 0.1227 - acc: 0.9554 - val_loss: 0.7327 - val_acc: 0.8090
Epoch 77/100
157/157 [==============================] - 32s 207ms/step - loss: 0.1138 - acc: 0.9608 - val_loss: 0.6984 - val_acc: 0.8118
Epoch 78/100
157/157 [==============================] - 32s 204ms/step - loss: 0.1205 - acc: 0.9567 - val_loss: 0.7042 - val_acc: 0.8108
Epoch 79/100
157/157 [==============================] - 30s 194ms/step - loss: 0.1214 - acc: 0.9589 - val_loss: 0.6767 - val_acc: 0.8074
Epoch 80/100
157/157 [==============================] - 30s 191ms/step - loss: 0.1108 - acc: 0.9621 - val_loss: 0.8082 - val_acc: 0.8044
Epoch 81/100
157/157 [==============================] - 30s 193ms/step - loss: 0.1118 - acc: 0.9602 - val_loss: 0.7464 - val_acc: 0.8096
Epoch 82/100
157/157 [==============================] - 30s 193ms/step - loss: 0.1100 - acc: 0.9624 - val_loss: 0.8293 - val_acc: 0.7924
Epoch 83/100
157/157 [==============================] - 30s 194ms/step - loss: 0.1138 - acc: 0.9599 - val_loss: 0.7283 - val_acc: 0.8074
Epoch 84/100
157/157 [==============================] - 30s 193ms/step - loss: 0.1121 - acc: 0.9625 - val_loss: 0.7342 - val_acc: 0.8024
Epoch 85/100
157/157 [==============================] - 30s 194ms/step - loss: 0.1046 - acc: 0.9656 - val_loss: 0.7643 - val_acc: 0.8028
Epoch 86/100
157/157 [==============================] - 30s 193ms/step - loss: 0.1015 - acc: 0.9650 - val_loss: 0.7842 - val_acc: 0.7974
Epoch 87/100
157/157 [==============================] - 30s 194ms/step - loss: 0.1001 - acc: 0.9650 - val_loss: 0.8347 - val_acc: 0.8010
Epoch 88/100
157/157 [==============================] - 31s 195ms/step - loss: 0.1074 - acc: 0.9632 - val_loss: 0.7701 - val_acc: 0.8038
Epoch 89/100
157/157 [==============================] - 30s 192ms/step - loss: 0.1060 - acc: 0.9622 - val_loss: 0.8084 - val_acc: 0.7986
Epoch 90/100
157/157 [==============================] - 30s 193ms/step - loss: 0.1034 - acc: 0.9651 - val_loss: 0.7394 - val_acc: 0.7998
Epoch 91/100
157/157 [==============================] - 30s 192ms/step - loss: 0.1044 - acc: 0.9639 - val_loss: 0.7577 - val_acc: 0.7928
Epoch 92/100
157/157 [==============================] - 30s 193ms/step - loss: 0.1013 - acc: 0.9651 - val_loss: 0.8262 - val_acc: 0.7910
Epoch 93/100
157/157 [==============================] - 31s 196ms/step - loss: 0.0989 - acc: 0.9657 - val_loss: 0.8126 - val_acc: 0.7980
Epoch 94/100
157/157 [==============================] - 33s 209ms/step - loss: 0.0982 - acc: 0.9669 - val_loss: 0.8116 - val_acc: 0.7920
Epoch 95/100
157/157 [==============================] - 31s 201ms/step - loss: 0.1006 - acc: 0.9658 - val_loss: 0.8379 - val_acc: 0.7996
Epoch 96/100
157/157 [==============================] - 32s 203ms/step - loss: 0.1018 - acc: 0.9652 - val_loss: 0.7555 - val_acc: 0.7828
Epoch 97/100
157/157 [==============================] - 31s 197ms/step - loss: 0.0978 - acc: 0.9668 - val_loss: 0.8253 - val_acc: 0.7904
Epoch 98/100
157/157 [==============================] - 31s 199ms/step - loss: 0.0930 - acc: 0.9688 - val_loss: 0.8245 - val_acc: 0.7940
Epoch 99/100
157/157 [==============================] - 31s 196ms/step - loss: 0.0917 - acc: 0.9683 - val_loss: 0.8288 - val_acc: 0.7872
Epoch 100/100
157/157 [==============================] - 31s 194ms/step - loss: 0.0963 - acc: 0.9669 - val_loss: 0.7995 - val_acc: 0.7976
In [14]:
import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()