Deep Learning with Python

In [1]:
%pylab inline
from ipypublish import nb_setup
Populating the interactive namespace from numpy and matplotlib

In this chapter we focus on implementing the same deep learning models in Python. This complements the examples presented in the previous chapter om using R for deep learning. We retain the same two examples. As we will see, the code here provides almost the same syntax but runs in Python. There are very few changes needed, but are the obvious ones for programming differences between the two languages. However, it is useful to note that TensorFlow in Python may be used without extensive knowledge of Python itself. In this sense, packages for implementing neural nets have begun to commoditize deep learning.

Here is the code in Python to fit the model and then test it. Very little programming is needed. First, we import all the required libraries.

import pylab

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers.advanced_activations import LeakyReLU
from keras import backend
from keras.utils import to_categorical

Cancer Data

Next, we read in the data from the cancer data set we have seen earlier. We also structure the date to prepare it for use with TensorFlow.

#Read in data
import pandas as pd
tf_train = pd.read_csv("BreastCancer.csv")
tf_test = pd.read_csv("BreastCancer.csv")
n = len(tf_train)

X_train = tf_train.iloc[:,1:10].values
X_test = tf_test.iloc[:,1:10].values
y_train = tf_train.iloc[:,10].values
y_test = tf_test.iloc[:,10].values

idx = y_train; y_train = zeros(len(idx));
y_train[idx=='malignant']=1; y_train = y_train.astype(int)

idx = y_test; y_test = zeros(len(idx));
y_test[idx=='malignant']=1; y_test = y_test.astype(int)

Y_train = to_categorical(y_train,2)

The model is then structured. We employ a fully-conected feed-forward network with five hidden layers, each with 512 neurons, Dropout of 25 percent is applied. The node functional form used is Leaky ReLU. The model is compiled as well.

#Set up and compile the model
model = Sequential()
n_units = 512
data_dim = X_train.shape[1]

model.add(Dense(n_units, input_dim=data_dim))
model.add(LeakyReLU())
model.add(Dropout(0.25))

model.add(Dense(n_units))
model.add(LeakyReLU())
model.add(Dropout(0.25))

model.add(Dense(n_units))
model.add(LeakyReLU())
model.add(Dropout(0.25))

model.add(Dense(n_units))
model.add(LeakyReLU())
model.add(Dropout(0.25))

model.add(Dense(n_units))
model.add(LeakyReLU())
model.add(Dropout(0.25))

model.add(Dense(2))
model.add(Activation('sigmoid'))

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

Finally, the model is fit to the data. We use a batch size of 32, and run the model for 15 epochs.

#Fit the model
bsize = 32
model.fit(X_train, Y_train, batch_size=bsize, epochs=15, validation_split=0.0, verbose=1)

The programming object for the entire model contains all its information, i.e., the specification of the model as well as it's fitted coefficients (weights). In the next section of code, we see how to take the model object and then apply it to the data for testing. We use the scikit-learn package to generate the confusion matrix for the fit. We also calculate the accuracy of the model, i.e., the ratio of the sum of the diagonal elements of the confusion matrix to the total of all its elements.

## OUT-SAMPLE ACCURACY
from sklearn.metrics import confusion_matrix
yhat = model.predict_classes(X_test,batch_size=bsize)
cm = confusion_matrix(y_test,yhat)
print("Confusion matrix = "); print(cm)
acc = sum(diag(cm))/sum(cm)
print("Accuracy = ",acc)

We run the code. Here is a sample of the training run. We only display the first few epochs. See Figure cancertraining.

In [2]:
#cancer_training
nb_setup.images_hconcat(["DL_images/cancer_training.png"], width=600)
Out[2]:

Next, is the output from testing the fitted model, and the corresponding confusion matrix. See Figure cancertesting.

In [3]:
#cancer_testing
nb_setup.images_hconcat(["DL_images/cancer_testing.png"], width=600)
Out[3]:

MNIST Data

We move on to the second canonical example, the classic MNIST data set. The data set is read in here, and formatted appropriately.

train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")

X_train = train.iloc[:,:-1].values
y_train = train.iloc[:,-1].values
X_test = test.iloc[:,:-1].values
y_test = test.iloc[:,-1].values

Y_train = to_categorical(y_train,10)

Notice that the training dataset is in the form of 3d tensors, of size $60,000 \times 28 \times 28$. (A tensor is just a higher-dimensional matrix, usually applied to mathematica structures that are greater than two dimensions.) This is where the "tensor" moniker comes from, and the "flow" part comes from the internal representation of the calculations on a flow network from input to eventual output.

Each pixel in the data set comprises a number in the range (0,255), depending on how dark the writing in the pixel is. This is normalized to lie in the range (0,1) by dividing all values by 255. This is a minimal amount of feature engineering that makes the model run better.

X_train = X_train/255.0
X_test = X_test/255.0

Define the model in Keras as follows. Note that we have hree hidden layers of 512 nodes each. The input layer has 784 elements.

model = Sequential()
n_units = 512
data_dim = X_train.shape[1]

model.add(Dense(n_units, input_dim=data_dim))
model.add(LeakyReLU())
model.add(Dropout(0.25))

model.add(Dense(n_units))
model.add(LeakyReLU())
model.add(Dropout(0.25))

model.add(Dense(n_units))
model.add(LeakyReLU())
model.add(Dropout(0.25))

model.add(Dense(10))
model.add(Activation('softmax'))

Then, compile the model.

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Finally, fit the model. We use a batch size of 32, and 5 epochs. We also keep 10 percent of the sample for validation.

bsize = 32
model.fit(X_train, Y_train, batch_size=bsize, epochs=5, validation_split=0.1, verbose=2)

The fitting run is as follows. We see that the training and validation accuracy are similar to each other, signifying that the model is not being overfit. See Figure mnisttraining.

In [4]:
#mnist_training
nb_setup.images_hconcat(["DL_images/mnist_training.png"], width=600)
Out[4]:

Here is the output from testing the fitted model, and the corresponding confusion matrix. See Figure mnisttesting.

In [5]:
#mnist_testing
nb_setup.images_hconcat(["DL_images/mnist_testing.png"], width=600)
Out[5]:

The accuracy of the model is approximately 89%. We then experimented with a smaller sized model where the hidden layers only have 100 nodes each (instead of the 512 used earlier). We see a dramatic improvement in the model fit, the out-of-sample fit is shown below. Now, the accuracy is 96%. Therefore, a smaller model does, in fact, do better! Parsimony pays. See Figure mnisttesting2.

In [6]:
#mnist_testing2
nb_setup.images_hconcat(["DL_images/mnist_testing2.png"], width=600)
Out[6]:

Option Pricing

In a recent article, Culkin and Das (2017) showed how to train a deep learning neural network to learn to price options from data on option prices and the inputs used to produce these options prices.

In order to do this, options prices were generated using random inputs and feeding them into the well-known Black and Scholes (1973) model. See also Merton (1973). The formula for call options is as follows.

$$ C = S e^{-qT} N(d_1) - K e^{-rT} N(d_2) $$

where

$$ d_1 = \frac{\ln(S/K) + (r-q-0.5 \sigma^2)T}{\sigma \sqrt{T}}; \quad d_2 = d_1 - \sigma \sqrt{T} $$

and $S$ is the current stock price, $K$ is the option strike price, $T$ is the option maturity, $q$, $r$ are the annualized dividend and risk-free rates, respectively; and $\sigma$ is the annualized volatility, i.e., the annualized standard deviation of returns on the stock. The authors generated 300,000 call prices from randomly generated inputs (see Table 1 in @CulkinDas), and then used this large number of observations to train and test a deep learning net that was aimed to mimic the Black-Scholes equation. Here is some sample data used for training the model. The first six columns show the inputs and the last column will need to be designed to emit a positive continuous output. See Figure BSNNsampleData.

In [7]:
#BS_NN_sampleData
nb_setup.images_hconcat(["DL_images/BS_NN_sampleData.png"], width=600)
Out[7]:

Normalization

The Black-Scholes model has the property that it is linear homogenous in the stock price $S$ and strike price $K$. This means that

$$ C(mS,mK) = m \cdot C(S,K) $$

This offers an excellent opportunity to normalize the data so that the level of the strike price does not play a role and this reduces the dimension of the model that needs to be fit. This means we can normalize spot and call prices and remove a variable by dividing by $K$. $$ \frac{C(S,K)}{K} = C(S/K,1) $$

Therefore the data stored in a pandas data frame (denoted df) is normalized as follows.

## Normalize the data exploiting the fact that the BS Model is linear homogenous in S,K
df["Stock Price"] = df["Stock Price"]/df["Strike Price"]
df["Call Price"] = df["Call Price"]/df["Strike Price"]

Before feeding the data into TensorFlow, we set it up appropriately into training ($80\%$) and testing data ($20\%$) sets.

n = 300000
n_train =  (int)(0.8 * n)
train = df[0:n_train]
X_train = train[['Stock Price', 'Maturity', 'Dividends', 'Volatility', 'Risk-free']].values
y_train = train['Call Price'].values
test = df[n_train+1:n]
X_test = test[['Stock Price', 'Maturity', 'Dividends', 'Volatility', 'Risk-free']].values
y_test = test['Call Price'].values

Custom Functions in the DLN

Next, import the TensorFlow and Keras libraries.

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, LeakyReLU
from keras import backend

Because the model aims to produce a positive continuous value for the option price, we cannot use the standard squashing functions that are used in TensorFlow, such as the sigmoid function. These functions emit values in the range $(0,1)$ and are not suitable for the range $(0,\infty)$, which is what we require. Therefore, we need to build our own output node functions, which is shown as a new python function in the following code block. The exponential function is used because it returns positive-only values

def custom_activation(x):
    return backend.exp(x)

Next, set up and compile the model. We have 4 hidden layers of 120 nodes each, and 6 input nodes and a single output node. A quick calculation shows that the total number of parameters that need to be fit for the deep learning net is $44,407$. The code for this set up is as follows. The loss function used is mean squared error (MSE), and the optimization used the RMSprop algorithm, discussed in Section \@ref(RMSPROP).

nodes = 120
model = Sequential()

model.add(Dense(nodes, input_dim=X_train.shape[1]))
model.add(LeakyReLU())
model.add(Dropout(0.25))

model.add(Dense(nodes, activation='elu'))
model.add(Dropout(0.25))

model.add(Dense(nodes, activation='relu'))
model.add(Dropout(0.25))

model.add(Dense(nodes, activation='elu'))
model.add(Dropout(0.25))

model.add(Dense(1))
model.add(Activation(custom_activation))

model.compile(loss='mse',optimizer='rmsprop')

We can then run the fitting method to calibrate the model by using loss function MSE. We used 10 epochs.

model.fit(X_train, y_train, batch_size=64, epochs=10, validation_split=0.1, verbose=2)

The run time shows the epochs and other metrics.

Train on 216000 samples, validate on 24000 samples
Epoch 1/10
7s - loss: 0.0049 - val_loss: 5.2774e-04
Epoch 2/10
7s - loss: 0.0014 - val_loss: 6.0296e-04
Epoch 3/10
7s - loss: 0.0010 - val_loss: 9.7482e-05
Epoch 4/10
7s - loss: 8.5638e-04 - val_loss: 9.4742e-04
Epoch 5/10
6s - loss: 7.4420e-04 - val_loss: 4.9001e-04
Epoch 6/10
6s - loss: 6.7696e-04 - val_loss: 8.6880e-04
Epoch 7/10
6s - loss: 6.3259e-04 - val_loss: 2.5829e-04
Epoch 8/10
6s - loss: 6.0162e-04 - val_loss: 1.1381e-04
Epoch 9/10
6s - loss: 5.7968e-04 - val_loss: 7.0383e-05
Epoch 10/10
7s - loss: 5.6714e-04 - val_loss: 1.3223e-04

Model Accuracy

We also define a special function to check the accuracy of the model. It generates several common statistics that we may find useful.

def CheckAccuracy(y,y_hat):
    stats = dict()

    stats['diff'] = y - y_hat

    stats['mse'] = mean(stats['diff']**2)
    print("Mean Squared Error:      ", stats['mse'])

    stats['rmse'] = sqrt(stats['mse'])
    print("Root Mean Squared Error: ", stats['rmse'])

    stats['mae'] = mean(abs(stats['diff']))
    print("Mean Absolute Error:     ", stats['mae'])

    stats['mpe'] = sqrt(stats['mse'])/mean(y)
    print("Mean Percent Error:      ", stats['mpe'])

    #plots
    mpl.rcParams['agg.path.chunksize'] = 100000
    figure(figsize=(14,10))
    plt.scatter(y, y_hat,color='black',linewidth=0.3,alpha=0.4, s=0.5)
    plt.xlabel('Actual Price',fontsize=20,fontname='Times New Roman')
    plt.ylabel('Predicted Price',fontsize=20,fontname='Times New Roman')
    plt.show()

    figure(figsize=(14,10))
    plt.hist(stats['diff'], bins=50,edgecolor='black',color='white')
    plt.xlabel('Diff')
    plt.ylabel('Density')
    plt.show()

    return stats

We then obtain the results as follows.

y_train_hat = model.predict(X_train)
#reduce dim (240000,1) -> (240000,) to match y_train's dim
y_train_hat = squeeze(y_train_hat)
CheckAccuracy(y_train, y_train_hat)

The code above generates the following results in-sample.

Mean Squared Error:       0.000133220249329
Root Mean Squared Error:  0.0115421076641
Mean Absolute Error:      0.00805507329857
Mean Percent Error:       0.0431469231473

We see that the mean error (RMSE) is $0.01$, which is a cent. In percentage terms this is about $4\%$. Hence, the deep learning model does an excellent job of fitting the Black-Scholes option pricing model. The accuracy may be enhanced with more epochs.

The results on the out-of-sample data are as follows.

Mean Squared Error:       0.000133346104402
Root Mean Squared Error:  0.0115475583741
Mean Absolute Error:      0.00805340310791
Mean Percent Error:       0.0432292686092

These are the same as in-sample, which is hardly surprising because the model is stationary, and the data in the out-of-sample case was produced by the same data-generating process. We show the plot of actual versus model-predicted prices, and see that they are highly accurate. See Figure BSNNActualPredicted.

In [8]:
#BS_NN_Actual_Predicted
nb_setup.images_hconcat(["DL_images/BS_NN_Actual_Predicted.png"], width=500)
Out[8]:

The reason the option pricing exercise works so well is because the model is fixed and does not change over time. Or, in time-series speak, the model is "stationary". Applying deep learning to market prediction is hard and generally has little success because markets are efficient. As this article in Bloomberg points out, there are some important reasons why deep learning finds it hard to predict markets, namely (i) markets are non-stationary, so models will need to constantly change; (ii) there is a lot more noise than signal, i.e., market efficiency holds; (iii) the edge is also very small, i.e., the violations of market efficiency, if any, are not material to trade on; and (iv) whatever gains are available from prediction are not big enough to offset transactions costs.

CIFAR-10 dataset

This is a well-known dataset for image recognition. The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Each image is of $32 × 32$ pixels and there are 60,000 images in 10 categories with 6,000 in each one. Of these 50,000 are for training and 10,000 for testing. The 10 categories of images are airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.

The code in Keras and TensorFlow is as follows. Notice that we do not use the standard Sequential model as before, but define each layer more flexibly. This is an alternative approach suppoted by Keras.

In [9]:
from keras.utils import to_categorical
from keras.datasets import cifar10 

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
num_categories = 10
Using TensorFlow backend.
In [10]:
#Normalize the data as this helps with fitting
x_train = x_train/255.0
x_test = x_test/255.0
In [11]:
#One-hot encoding of the Y variable
y_train = to_categorical(y_train, num_categories)
y_test = to_categorical(y_test, num_categories)
In [12]:
#Examine the shapes of the tensors within
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)
(50000, 32, 32, 3)
(10000, 32, 32, 3)
(50000, 10)
(10000, 10)
In [13]:
#Define the Neural Net
from keras.layers import Input, Flatten, Dense
from keras.models import Model

in_layer = Input(shape=(32,32,3))  # The last 3 is for RGB values
x = Flatten()(in_layer)  #passes the input later into Flatten to make it 1D of size 32x32x3
x = Dense(units=200, activation="relu")(x)  #passes previous x into a 200 neuron layer
x = Dense(units=200, activation="relu")(x)  #another layer
x = Dense(units=100, activation="relu")(x)  #another layer
out_layer = Dense(units=10, activation="softmax")(x)
model = Model(in_layer, out_layer)
In [14]:
#Compile the model
from keras.optimizers import Adam
optim = Adam(lr=0.0005)  #eta is the learning rate
model.compile(loss='categorical_crossentropy', optimizer=optim, metrics=['accuracy'])
In [15]:
model.summary()
Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 3072)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 200)               614600    
_________________________________________________________________
dense_2 (Dense)              (None, 200)               40200     
_________________________________________________________________
dense_3 (Dense)              (None, 100)               20100     
_________________________________________________________________
dense_4 (Dense)              (None, 10)                1010      
=================================================================
Total params: 675,910
Trainable params: 675,910
Non-trainable params: 0
_________________________________________________________________

Fit the model

model.fit(x_train, y_train, epochs=15, batch_size=32, shuffle=True)

Get objective function value and accuracy

model.evaluate(x_test, y_test)
In [16]:
nb_setup.images_vconcat(["DL_images/CIFAR10_training.png","DL_images/CIFAR10_evaluation.png"], width=600)
Out[16]:

This is way better than random guessing which would give an accuracy of 10%. Here we have a training accuracy of 56% and a testing accuracy of 51.65%.

Plot the actual image and prediction

figure(figsize=(6,2))
categories = ['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck'] 
for j in range(6):
  subplot(1,6,j+1)
  idx = randint(len(x_test))
  imshow(x_test[idx])
  res = model.predict(x_test)[idx]
  print('Image True type  =',categories[argmax(y_test[idx])])
  print('Image Prediction =',categories[argmax(res)])
In [17]:
nb_setup.images_vconcat(["DL_images/CIFAR10_predictions.png"], width=600)
Out[17]:
In [ ]: