AutoEncoders¶

%pylab inline
import pandas as pd
from ipypublish import nb_setup

Populating the interactive namespace from numpy and matplotlib

An autoencoder is a neural net that maps features $X$ into labels $Y$, where $Y=X$. Such a neural net usually has a an odd number of layers where the middle layer has the fewest nodes and the number of nodes per layer increases as the layers get away from the middle. Essentially, what the autoencoder is doing is transforming the input data $X$ into its smallest representation in the middle layer and then expanding this representation as we move towards the output layer. Since the output is the same as the input the original data is compressed and decompressed. Therefore, the middle layer contains the compressed version of $X$, i.e., the reduced dimension version. The neural net encodes $X$ into a smaller representation $X'$ and then decodes it back into the original $X$. This is called "encoding-decoding" and the neural net is called an "encoder-decoder" network. The more general term is "autoencoder".

This note from the Keras blog provides good detail. Some of the material below is based on this blog.

MNIST Example¶

The MNIST data set is a good test bed for generating compressed images using an autoencoder. The input feature vector is of size 784, compressed (encoded) down to a size of 32 and then decompressed (decoded) back to 784.

#Single fully-connected neural layer as encoder and as decoder
from keras.layers import Input, Dense
from keras.models import Model

#This is the size of our encoded representations
encoding_dim = 32  # 32 floats -> compression of factor 24.5, assuming the input is 784 floats

# this is our input placeholder
input_img = Input(shape=(784,))
# "encoded" is the encoded representation of the input
encoded = Dense(encoding_dim, activation='relu')(input_img)
# "decoded" is the lossy reconstruction of the input
decoded = Dense(784, activation='sigmoid')(encoded)

# this model maps an input to its reconstruction
autoencoder = Model(input_img, decoded)

Using TensorFlow backend.

Encoder¶

Here the model encodes the higher dimension (784) feature set into a lower dimension (32).

#Separate encoder model
# this model maps an input to its encoded representation
encoder = Model(input_img, encoded)

Decoder¶

The compressed image is decoded back to its full dimension.

#Separate decoder model
# create a placeholder for an encoded (32-dimensional) input
encoded_input = Input(shape=(encoding_dim,))
# retrieve the last layer of the autoencoder model
decoder_layer = autoencoder.layers[-1]
# create the decoder model
decoder = Model(encoded_input, decoder_layer(encoded_input))

Compile and Fit the Autoencoder¶

#Compile the model
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
11493376/11490434 [==============================] - 7s 1us/step

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
print(x_train.shape)
print(x_test.shape)

(60000, 784)
(10000, 784)

#Train the model
autoencoder.fit(x_train, x_train,
                epochs=10,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 3s 53us/step - loss: 0.3654 - val_loss: 0.2717
Epoch 2/10
60000/60000 [==============================] - 2s 37us/step - loss: 0.2651 - val_loss: 0.2553
Epoch 3/10
60000/60000 [==============================] - 2s 36us/step - loss: 0.2444 - val_loss: 0.2312
Epoch 4/10
60000/60000 [==============================] - 2s 34us/step - loss: 0.2222 - val_loss: 0.2117
Epoch 5/10
60000/60000 [==============================] - 2s 34us/step - loss: 0.2062 - val_loss: 0.1985
Epoch 6/10
60000/60000 [==============================] - 2s 34us/step - loss: 0.1951 - val_loss: 0.1892
Epoch 7/10
60000/60000 [==============================] - 2s 34us/step - loss: 0.1869 - val_loss: 0.1819
Epoch 8/10
60000/60000 [==============================] - 2s 34us/step - loss: 0.1803 - val_loss: 0.1758
Epoch 9/10
60000/60000 [==============================] - 2s 36us/step - loss: 0.1747 - val_loss: 0.1706
Epoch 10/10
60000/60000 [==============================] - 2s 36us/step - loss: 0.1697 - val_loss: 0.1659

<keras.callbacks.History at 0x26f259393c8>

# encode and decode some digits
# note that we take them from the *test* set
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)

print(type(encoded_imgs))
print(encoded_imgs.shape)
print(decoded_imgs.shape)

<class 'numpy.ndarray'>
(10000, 32)
(10000, 784)

# use Matplotlib (don't ask)
n = 25  # how many digits we will display
figure(figsize=(20, 4))
for i in range(n):
    # display original
    ax = subplot(2, n, i + 1)
    imshow(x_test[i].reshape(28, 28))
    gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # display reconstruction
    ax = subplot(2, n, i + 1 + n)
    imshow(decoded_imgs[i].reshape(28, 28))
    gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
show()

We see that the compressed images are recognizable. If the compressed dimension were higher (say 64), then the images would be clearer, of course.

Deep AutoEncoder¶

We include some more layers here, 5 hidden layers in total (the previous example only had one hidden layer). Note the conical structure of the hidden layers where the compressed middle layer is reached through progressively smaller hidden layers. Because of the additional layers, the nomenclature "deep" autoencoder is apt.

input_img = Input(shape=(784,))
encoded = Dense(128, activation='relu')(input_img)
encoded = Dense(64, activation='relu')(encoded)

encoded = Dense(32, activation='relu')(encoded)

decoded = Dense(64, activation='relu')(encoded)
decoded = Dense(128, activation='relu')(decoded)
decoded = Dense(784, activation='sigmoid')(decoded)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

autoencoder.fit(x_train, x_train,
                epochs=10,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 4s 70us/step - loss: 0.3313 - val_loss: 0.2618
Epoch 2/10
60000/60000 [==============================] - 4s 59us/step - loss: 0.2552 - val_loss: 0.2458
Epoch 3/10
60000/60000 [==============================] - 4s 59us/step - loss: 0.2360 - val_loss: 0.2239
Epoch 4/10
60000/60000 [==============================] - 3s 57us/step - loss: 0.2139 - val_loss: 0.2033
Epoch 5/10
60000/60000 [==============================] - 3s 56us/step - loss: 0.1973 - val_loss: 0.1957
Epoch 6/10
60000/60000 [==============================] - 3s 56us/step - loss: 0.1855 - val_loss: 0.1805
Epoch 7/10
60000/60000 [==============================] - 3s 58us/step - loss: 0.1782 - val_loss: 0.1737
Epoch 8/10
60000/60000 [==============================] - 4s 59us/step - loss: 0.1713 - val_loss: 0.1664
Epoch 9/10
60000/60000 [==============================] - 4s 59us/step - loss: 0.1660 - val_loss: 0.1631
Epoch 10/10
60000/60000 [==============================] - 3s 55us/step - loss: 0.1624 - val_loss: 0.1587

<keras.callbacks.History at 0x26f2cdca160>

# this model maps an input to its encoded representation
encoder = Model(input_img, encoded)

encoded_input = Input(shape=(encoding_dim,))
# retrieve the 3rd to last of the autoencoder model
decoded = autoencoder.layers[-3](encoded_input)
# retrieve the 2nd to last of the autoencoder model
decoded = autoencoder.layers[-2](decoded)
# retrieve the 1st to last of the autoencoder model
decoded = autoencoder.layers[-1](decoded)
# create the decoder model
decoder = Model(encoded_input, decoded)

encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)

print(encoded_imgs.shape)
decoded_imgs.shape

(10000, 32)

(10000, 784)

n = 25  # how many digits we will display
figure(figsize=(20, 4))
for i in range(n):
    # display original
    ax = subplot(2, n, i + 1)
    imshow(x_test[i].reshape(28, 28))
    gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # display reconstruction
    ax = subplot(2, n, i + 1 + n)
    imshow(decoded_imgs[i].reshape(28, 28))
    gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
show()

The additional encoding and decoding layers in the deep autoencoder result in a better quality compressed image than for the shallow autoencoder.

Dimension Reduction on Treasury Rates¶

The Treasury yield curve is the relationship of government interest rates with maturity. The curve moves over time as the interest rates change. Rates for different maturities do not always change by the same amounts so the shape of the yield curve evolves. Here are yield curve dynamics related to the stock market. (Click on the "animate" button to watch its evolution.)

Treasury interest rates are assumed to be driven by a few factors, typically three, known for the kinds of movement in the yield curve. These are level, slope, and curvature changes in the curve. Most of the changes tend to be level effects, i.e., rates are driven by a single driving force that moves them all in the same direction.

In the following example we take a long time series of yields for 8 maturities (i.e., 8 features) and use an autoencoder to extract the time series of a smaller feature set.

%pylab inline
import pandas as pd
from keras.layers import Input, Dense
from keras.models import Model

Populating the interactive namespace from numpy and matplotlib

We read in the interest rates and then examine the correlation of the 8 series. As you can see, the correlation among the rates is very high, signifying that there may be one major underlying feature driving the entire system of rates over time.

rates = pd.read_csv("DL_data/tryrates.txt", sep="\t")
print(rates.shape)
rates.head()

(367, 9)

rates.tail()

rates.corr()

rates = rates.drop("DATE", axis=1)
rates = array(rates)
print(rates.shape)
print(type(rates))

(367, 8)
<class 'numpy.ndarray'>

We will attempt to compress the feature set down from dimension 8 to dimension 2.

encoding_dim = 2 #No of factors
x_train = rates
x_test = rates

Multiple layers. We use 5 hidden layers.

input_img = Input(shape=(8,))
encoded = Dense(6, activation='relu')(input_img)
encoded = Dense(4, activation='relu')(encoded)

encoded = Dense(encoding_dim, activation='relu')(encoded)  #Middle layer

decoded = Dense(4, activation='relu')(encoded)
decoded = Dense(6, activation='relu')(decoded)
decoded = Dense(8, activation='sigmoid')(decoded)

autoencoder = Model(input_img, decoded)
#autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
autoencoder.compile(optimizer='adadelta', loss='mean_squared_error')

autoencoder.fit(x_train, x_train,
                epochs=15,
                batch_size=32,
                shuffle=True,
                validation_data=(x_test, x_test))

Train on 367 samples, validate on 367 samples
Epoch 1/15
367/367 [==============================] - 0s 1ms/step - loss: 51.9400 - val_loss: 51.6492
Epoch 2/15
367/367 [==============================] - 0s 84us/step - loss: 51.4608 - val_loss: 51.2391
Epoch 3/15
367/367 [==============================] - 0s 90us/step - loss: 51.0928 - val_loss: 50.9198
Epoch 4/15
367/367 [==============================] - 0s 84us/step - loss: 50.8024 - val_loss: 50.6607
Epoch 5/15
367/367 [==============================] - 0s 87us/step - loss: 50.5684 - val_loss: 50.4720
Epoch 6/15
367/367 [==============================] - 0s 65us/step - loss: 50.4152 - val_loss: 50.3463
Epoch 7/15
367/367 [==============================] - 0s 73us/step - loss: 50.2928 - val_loss: 50.2210
Epoch 8/15
367/367 [==============================] - 0s 62us/step - loss: 50.1564 - val_loss: 50.0648
Epoch 9/15
367/367 [==============================] - 0s 73us/step - loss: 49.9801 - val_loss: 49.8585
Epoch 10/15
367/367 [==============================] - 0s 65us/step - loss: 49.7444 - val_loss: 49.5809
Epoch 11/15
367/367 [==============================] - 0s 65us/step - loss: 49.4283 - val_loss: 49.2038
Epoch 12/15
367/367 [==============================] - 0s 79us/step - loss: 48.9804 - val_loss: 48.6761
Epoch 13/15
367/367 [==============================] - 0s 71us/step - loss: 48.4120 - val_loss: 48.0508
Epoch 14/15
367/367 [==============================] - 0s 92us/step - loss: 47.7429 - val_loss: 47.3041
Epoch 15/15
367/367 [==============================] - 0s 87us/step - loss: 46.7791 - val_loss: 45.8413

<keras.callbacks.History at 0x26f289ded30>

#Needed only for generating output of encoder and decoder
encoder = Model(input_img, encoded)
encoded_input = Input(shape=(encoding_dim,))
decoded = autoencoder.layers[-3](encoded_input)
decoded = autoencoder.layers[-2](decoded)
decoded = autoencoder.layers[-1](decoded)
decoder = Model(encoded_input, decoded)

encoded_imgs = encoder.predict(x_train)
decoded_imgs = decoder.predict(encoded_imgs)
print(encoded_imgs.shape)
print(decoded_imgs.shape)

(367, 2)
(367, 8)

encoded_imgs[:10,:]

array([[12.739204 ,  0.       ],
       [12.609625 ,  0.       ],
       [12.4043665,  0.       ],
       [12.122582 ,  0.       ],
       [11.723388 ,  0.       ],
       [11.410023 ,  0.       ],
       [10.700787 ,  0.       ],
       [11.414553 ,  0.       ],
       [11.770114 ,  0.       ],
       [11.876378 ,  0.       ]], dtype=float32)

#First factor
plot(encoded_imgs[:,0])
grid()

#Second factor
plot(encoded_imgs[:,1])
grid()

As we see, the autoencoder finds that it can compress the entire 8 dimensions down to a single dimension. We call this as a "single factor" model of all interest rates. This is is the underlying driving force. It looks very much like the main factor correlated with the US inflation rate.

nb_setup.images_vconcat(["DL_images/US_inflation.png"], width=600)

Compare this to PCA:

Equity Returns¶

We do a similar analysis for the equity markets. We download time series of stock prices for several tickers (21 in total) and then convert the data into daily returns. This becomes the feature set that we aim to compress down to a smaller feature set (of size 5).

%pylab inline import pandas as pd from keras.layers import Input, Dense from keras.models import Model

We download the data from the web and construct a dataframe of stock prices.

# IMPORTING STOCK DATA USING PANDAS
# Remember to "pip install pandas-datareader" 
import pandas_datareader.data as web
from datetime import datetime

tickers = ["GOOG","MSFT","AMZN","AAPL","AMAT","ORCL","CSCO","HPQ","INFY","IBM","JNPR","LOGI",
           "QCOM","SAP","VMW","WIT","XRX","C","BAC","PG","PEP"]

stkp = web.DataReader(tickers,"yahoo",datetime(2010,1,1),datetime(2018,12,31))
stkp = stkp["Adj Close"]
stkp.head()

stkp.to_csv("DL_data/equity_prices.csv")

Convert stocks prices into returns.

#Read in data and prepare for Autoencoder
stkp = pd.read_csv("DL_data/equity_prices.csv")
stkp = stkp.drop("Date", axis=1)
rets = stkp.pct_change()
rets = rets.iloc[1:]
print(rets.shape)
rets.head()

(2263, 21)

rets.dropna()
rets = array(rets)
rets = rets*100.0
print(rets.shape)
print(type(rets))

(2263, 21)
<class 'numpy.ndarray'>

We select the number of dimensions we want in the reduced feature set.

encoding_dim = 5 #No of factors
x_train = rets
x_test = rets

Set up the autoencoder with 5 hidden layers.

input_img = Input(shape=(21,))
encoded = Dense(15, activation='tanh')(input_img)
encoded = Dense(9, activation='tanh')(encoded)

encoded = Dense(encoding_dim, activation='tanh')(encoded)  #Middle layer

decoded = Dense(9, activation='tanh')(encoded)
decoded = Dense(15, activation='tanh')(decoded)
decoded = Dense(21, activation='linear')(decoded)

Compile and fit the autoencoder.

autoencoder = Model(input_img, decoded)
#autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
autoencoder.compile(optimizer='adadelta', loss='mean_squared_error')

autoencoder.fit(x_train, x_train,
                epochs=100,
                batch_size=32,
                shuffle=True,
                validation_data=(x_test, x_test))

Train on 2263 samples, validate on 2263 samples
Epoch 1/100
2263/2263 [==============================] - 1s 268us/step - loss: 2.5770 - val_loss: 2.1867
Epoch 2/100
2263/2263 [==============================] - 0s 70us/step - loss: 2.0794 - val_loss: 1.9894
Epoch 3/100
2263/2263 [==============================] - 0s 71us/step - loss: 1.9457 - val_loss: 1.8918
Epoch 4/100
2263/2263 [==============================] - 0s 66us/step - loss: 1.8668 - val_loss: 1.8311
Epoch 5/100
2263/2263 [==============================] - 0s 67us/step - loss: 1.8115 - val_loss: 1.7789
Epoch 6/100
2263/2263 [==============================] - 0s 66us/step - loss: 1.7634 - val_loss: 1.7360
Epoch 7/100
2263/2263 [==============================] - 0s 67us/step - loss: 1.7239 - val_loss: 1.6988
Epoch 8/100
2263/2263 [==============================] - 0s 65us/step - loss: 1.6875 - val_loss: 1.6655
Epoch 9/100
2263/2263 [==============================] - 0s 69us/step - loss: 1.6536 - val_loss: 1.6298
Epoch 10/100
2263/2263 [==============================] - 0s 69us/step - loss: 1.6246 - val_loss: 1.6028
Epoch 11/100
2263/2263 [==============================] - 0s 69us/step - loss: 1.5998 - val_loss: 1.5815
Epoch 12/100
2263/2263 [==============================] - 0s 69us/step - loss: 1.5787 - val_loss: 1.5603
Epoch 13/100
2263/2263 [==============================] - 0s 67us/step - loss: 1.5569 - val_loss: 1.5403
Epoch 14/100
2263/2263 [==============================] - 0s 68us/step - loss: 1.5384 - val_loss: 1.5211
Epoch 15/100
2263/2263 [==============================] - 0s 74us/step - loss: 1.5196 - val_loss: 1.5026
Epoch 16/100
2263/2263 [==============================] - 0s 70us/step - loss: 1.5039 - val_loss: 1.4885
Epoch 17/100
2263/2263 [==============================] - 0s 64us/step - loss: 1.4867 - val_loss: 1.4795
Epoch 18/100
2263/2263 [==============================] - 0s 62us/step - loss: 1.4717 - val_loss: 1.4555
Epoch 19/100
2263/2263 [==============================] - 0s 64us/step - loss: 1.4565 - val_loss: 1.4655
Epoch 20/100
2263/2263 [==============================] - 0s 67us/step - loss: 1.4438 - val_loss: 1.4307
Epoch 21/100
2263/2263 [==============================] - 0s 64us/step - loss: 1.4293 - val_loss: 1.4293
Epoch 22/100
2263/2263 [==============================] - 0s 67us/step - loss: 1.4175 - val_loss: 1.4062
Epoch 23/100
2263/2263 [==============================] - 0s 62us/step - loss: 1.4079 - val_loss: 1.4003
Epoch 24/100
2263/2263 [==============================] - 0s 63us/step - loss: 1.3955 - val_loss: 1.3897
Epoch 25/100
2263/2263 [==============================] - 0s 66us/step - loss: 1.3848 - val_loss: 1.3773
Epoch 26/100
2263/2263 [==============================] - 0s 88us/step - loss: 1.3726 - val_loss: 1.3785
Epoch 27/100
2263/2263 [==============================] - 0s 83us/step - loss: 1.3646 - val_loss: 1.3614
Epoch 28/100
2263/2263 [==============================] - 0s 81us/step - loss: 1.3556 - val_loss: 1.3462
Epoch 29/100
2263/2263 [==============================] - 0s 85us/step - loss: 1.3444 - val_loss: 1.3380
Epoch 30/100
2263/2263 [==============================] - 0s 85us/step - loss: 1.3403 - val_loss: 1.3295
Epoch 31/100
2263/2263 [==============================] - 0s 84us/step - loss: 1.3286 - val_loss: 1.3256
Epoch 32/100
2263/2263 [==============================] - 0s 86us/step - loss: 1.3208 - val_loss: 1.3285
Epoch 33/100
2263/2263 [==============================] - 0s 83us/step - loss: 1.3216 - val_loss: 1.3056
Epoch 34/100
2263/2263 [==============================] - 0s 71us/step - loss: 1.3114 - val_loss: 1.2972
Epoch 35/100
2263/2263 [==============================] - 0s 74us/step - loss: 1.3027 - val_loss: 1.3286
Epoch 36/100
2263/2263 [==============================] - 0s 83us/step - loss: 1.3019 - val_loss: 1.3094
Epoch 37/100
2263/2263 [==============================] - 0s 81us/step - loss: 1.2879 - val_loss: 1.2782
Epoch 38/100
2263/2263 [==============================] - 0s 78us/step - loss: 1.2814 - val_loss: 1.2738
Epoch 39/100
2263/2263 [==============================] - 0s 68us/step - loss: 1.2820 - val_loss: 1.2648
Epoch 40/100
2263/2263 [==============================] - 0s 58us/step - loss: 1.2689 - val_loss: 1.2623
Epoch 41/100
2263/2263 [==============================] - 0s 62us/step - loss: 1.2680 - val_loss: 1.2572
Epoch 42/100
2263/2263 [==============================] - 0s 65us/step - loss: 1.2624 - val_loss: 1.2551
Epoch 43/100
2263/2263 [==============================] - 0s 60us/step - loss: 1.2562 - val_loss: 1.2864
Epoch 44/100
2263/2263 [==============================] - 0s 61us/step - loss: 1.2579 - val_loss: 1.2443
Epoch 45/100
2263/2263 [==============================] - 0s 63us/step - loss: 1.2515 - val_loss: 1.2409
Epoch 46/100
2263/2263 [==============================] - 0s 61us/step - loss: 1.2490 - val_loss: 1.2313
Epoch 47/100
2263/2263 [==============================] - 0s 62us/step - loss: 1.2417 - val_loss: 1.2271
Epoch 48/100
2263/2263 [==============================] - 0s 61us/step - loss: 1.2377 - val_loss: 1.2568
Epoch 49/100
2263/2263 [==============================] - 0s 62us/step - loss: 1.2350 - val_loss: 1.2306
Epoch 50/100
2263/2263 [==============================] - 0s 63us/step - loss: 1.2318 - val_loss: 1.2255
Epoch 51/100
2263/2263 [==============================] - 0s 62us/step - loss: 1.2262 - val_loss: 1.2187
Epoch 52/100
2263/2263 [==============================] - 0s 61us/step - loss: 1.2299 - val_loss: 1.2169
Epoch 53/100
2263/2263 [==============================] - 0s 63us/step - loss: 1.2236 - val_loss: 1.2431
Epoch 54/100
2263/2263 [==============================] - 0s 55us/step - loss: 1.2251 - val_loss: 1.2105
Epoch 55/100
2263/2263 [==============================] - 0s 54us/step - loss: 1.2135 - val_loss: 1.2258
Epoch 56/100
2263/2263 [==============================] - 0s 55us/step - loss: 1.2183 - val_loss: 1.2035
Epoch 57/100
2263/2263 [==============================] - 0s 54us/step - loss: 1.2142 - val_loss: 1.2113
Epoch 58/100
2263/2263 [==============================] - 0s 55us/step - loss: 1.2089 - val_loss: 1.2048
Epoch 59/100
2263/2263 [==============================] - 0s 56us/step - loss: 1.2099 - val_loss: 1.1988
Epoch 60/100
2263/2263 [==============================] - 0s 56us/step - loss: 1.2082 - val_loss: 1.1967
Epoch 61/100
2263/2263 [==============================] - 0s 53us/step - loss: 1.2074 - val_loss: 1.1926
Epoch 62/100
2263/2263 [==============================] - 0s 55us/step - loss: 1.1995 - val_loss: 1.1910
Epoch 63/100
2263/2263 [==============================] - 0s 56us/step - loss: 1.2041 - val_loss: 1.1894
Epoch 64/100
2263/2263 [==============================] - 0s 56us/step - loss: 1.1987 - val_loss: 1.2719
Epoch 65/100
2263/2263 [==============================] - 0s 54us/step - loss: 1.1951 - val_loss: 1.2031
Epoch 66/100
2263/2263 [==============================] - 0s 75us/step - loss: 1.1973 - val_loss: 1.2590
Epoch 67/100
2263/2263 [==============================] - 0s 78us/step - loss: 1.1992 - val_loss: 1.1810
Epoch 68/100
2263/2263 [==============================] - 0s 61us/step - loss: 1.1926 - val_loss: 1.1910
Epoch 69/100
2263/2263 [==============================] - 0s 60us/step - loss: 1.1933 - val_loss: 1.1801
Epoch 70/100
2263/2263 [==============================] - 0s 56us/step - loss: 1.1907 - val_loss: 1.2031
Epoch 71/100
2263/2263 [==============================] - 0s 82us/step - loss: 1.1912 - val_loss: 1.1791
Epoch 72/100
2263/2263 [==============================] - 0s 72us/step - loss: 1.1984 - val_loss: 1.1843
Epoch 73/100
2263/2263 [==============================] - 0s 62us/step - loss: 1.1859 - val_loss: 1.1759
Epoch 74/100
2263/2263 [==============================] - 0s 62us/step - loss: 1.1912 - val_loss: 1.1762
Epoch 75/100
2263/2263 [==============================] - 0s 61us/step - loss: 1.1852 - val_loss: 1.1807
Epoch 76/100
2263/2263 [==============================] - 0s 65us/step - loss: 1.1930 - val_loss: 1.1868
Epoch 77/100
2263/2263 [==============================] - 0s 62us/step - loss: 1.1862 - val_loss: 1.1735
Epoch 78/100
2263/2263 [==============================] - 0s 64us/step - loss: 1.1833 - val_loss: 1.1746
Epoch 79/100
2263/2263 [==============================] - 0s 54us/step - loss: 1.1833 - val_loss: 1.2247
Epoch 80/100
2263/2263 [==============================] - 0s 56us/step - loss: 1.1816 - val_loss: 1.1713
Epoch 81/100
2263/2263 [==============================] - 0s 61us/step - loss: 1.1854 - val_loss: 1.1847
Epoch 82/100
2263/2263 [==============================] - 0s 52us/step - loss: 1.1783 - val_loss: 1.1681
Epoch 83/100
2263/2263 [==============================] - 0s 63us/step - loss: 1.1794 - val_loss: 1.1690
Epoch 84/100
2263/2263 [==============================] - 0s 83us/step - loss: 1.1847 - val_loss: 1.2239
Epoch 85/100
2263/2263 [==============================] - 0s 82us/step - loss: 1.1811 - val_loss: 1.1692
Epoch 86/100
2263/2263 [==============================] - 0s 68us/step - loss: 1.1816 - val_loss: 1.1703
Epoch 87/100
2263/2263 [==============================] - 0s 77us/step - loss: 1.1753 - val_loss: 1.2034
Epoch 88/100
2263/2263 [==============================] - 0s 77us/step - loss: 1.1778 - val_loss: 1.1689
Epoch 89/100
2263/2263 [==============================] - 0s 79us/step - loss: 1.1796 - val_loss: 1.1677
Epoch 90/100
2263/2263 [==============================] - 0s 80us/step - loss: 1.1751 - val_loss: 1.1616
Epoch 91/100
2263/2263 [==============================] - 0s 80us/step - loss: 1.1755 - val_loss: 1.1706
Epoch 92/100
2263/2263 [==============================] - 0s 77us/step - loss: 1.1780 - val_loss: 1.1632
Epoch 93/100
2263/2263 [==============================] - 0s 74us/step - loss: 1.1770 - val_loss: 1.1609
Epoch 94/100
2263/2263 [==============================] - 0s 78us/step - loss: 1.1744 - val_loss: 1.1788
Epoch 95/100
2263/2263 [==============================] - 0s 84us/step - loss: 1.1737 - val_loss: 1.1614
Epoch 96/100
2263/2263 [==============================] - 0s 82us/step - loss: 1.1682 - val_loss: 1.1856
Epoch 97/100
2263/2263 [==============================] - 0s 80us/step - loss: 1.1680 - val_loss: 1.1778
Epoch 98/100
2263/2263 [==============================] - 0s 75us/step - loss: 1.1767 - val_loss: 1.2789
Epoch 99/100
2263/2263 [==============================] - 0s 65us/step - loss: 1.1748 - val_loss: 1.2054
Epoch 100/100
2263/2263 [==============================] - 0s 66us/step - loss: 1.1743 - val_loss: 1.1666

<keras.callbacks.History at 0x26f2be9c908>

# Needed only for generating output of encoder and decoder
encoder = Model(input_img, encoded)
encoded_input = Input(shape=(encoding_dim,))
decoded = autoencoder.layers[-3](encoded_input)
decoded = autoencoder.layers[-2](decoded)
decoded = autoencoder.layers[-1](decoded)
decoder = Model(encoded_input, decoded)

encoded_imgs = encoder.predict(x_train)
decoded_imgs = decoder.predict(encoded_imgs)
print(encoded_imgs.shape)
print(decoded_imgs.shape)

(2263, 5)
(2263, 21)

Show the first few values of the 5 dimension reduced data set.

encoded_imgs[:10,:]

array([[ 0.00283151,  0.01528652,  0.05527074, -0.19832836,  0.16743529],
       [ 0.00580678, -0.1178952 , -0.06068214, -0.08620237,  0.11510506],
       [ 0.04048476, -0.2203376 , -0.05589669, -0.11357659, -0.03265905],
       [ 0.13060416,  0.1827189 ,  0.07834211,  0.10823792, -0.1216527 ],
       [ 0.04744709, -0.05550193,  0.0771999 ,  0.02098431,  0.12161379],
       [-0.2389926 ,  0.1984133 ,  0.22969341,  0.10505562,  0.23285484],
       [ 0.02137342,  0.13749108,  0.11184017, -0.09402561,  0.04337397],
       [ 0.0764709 , -0.03322998, -0.10139063, -0.16289942, -0.10056607],
       [-0.08460249,  0.01684685,  0.00410676,  0.15169534, -0.0258303 ],
       [ 0.05344473,  0.14185426,  0.20616563, -0.05726802,  0.07331206]],
      dtype=float32)

The correlation of the factors is not high as we would expect. They capture different attributes of the data.

corrcoef(encoded_imgs.T)

array([[ 1.        ,  0.3927324 ,  0.48564416, -0.30553646,  0.11075414],
       [ 0.3927324 ,  1.        ,  0.34081491, -0.21827071,  0.18674833],
       [ 0.48564416,  0.34081491,  1.        , -0.15090528,  0.26003022],
       [-0.30553646, -0.21827071, -0.15090528,  1.        , -0.20661074],
       [ 0.11075414,  0.18674833,  0.26003022, -0.20661074,  1.        ]])

We plot the first and fourth factors just to see what they look like.

plot(encoded_imgs[:,0])
grid()

plot(encoded_imgs[:,3])
grid()

We also plot the third ticker AMZN against its decoded value, and see that they are strongly correlated (75%) as they should be.

plot(rets[:,2], decoded_imgs[:,0], 'bo')
grid()

plot(rets[:,2])
plot(decoded_imgs[:,2])
corrcoef(rets[:,2],decoded_imgs[:,2])

array([[1.        , 0.74885834],
       [0.74885834, 1.        ]])

	DATE	FYGM3	FYGM6	FYGT1	FYGT2	FYGT3	FYGT5	FYGT7	FYGT10
0	Jun-76	5.41	5.77	6.52	7.06	7.31	7.61	7.75	7.86
1	Jul-76	5.23	5.53	6.20	6.85	7.12	7.49	7.70	7.83
2	Aug-76	5.14	5.40	6.00	6.63	6.86	7.31	7.58	7.77
3	Sep-76	5.08	5.30	5.84	6.42	6.66	7.13	7.41	7.59
4	Oct-76	4.92	5.06	5.50	5.98	6.24	6.75	7.16	7.41

	DATE	FYGM3	FYGM6	FYGT1	FYGT2	FYGT3	FYGT5	FYGT7	FYGT10
362	Aug-06	4.96	4.97	5.08	4.90	4.85	4.82	4.83	4.88
363	Sep-06	4.81	4.89	4.97	4.77	4.69	4.67	4.68	4.72
364	Oct-06	4.92	4.92	5.01	4.80	4.72	4.69	4.69	4.73
365	Nov-06	4.94	4.95	5.01	4.74	4.64	4.58	4.58	4.60
366	Dec-06	4.85	4.88	4.94	4.67	4.58	4.53	4.54	4.56

	FYGM3	FYGM6	FYGT1	FYGT2	FYGT3	FYGT5	FYGT7	FYGT10
FYGM3	1.000000	0.997537	0.991125	0.975089	0.961225	0.938329	0.922041	0.906564
FYGM6	0.997537	1.000000	0.997350	0.985125	0.972844	0.951266	0.935603	0.920542
FYGT1	0.991125	0.997350	1.000000	0.993696	0.984692	0.966859	0.953130	0.939686
FYGT2	0.975089	0.985125	0.993696	1.000000	0.997767	0.987892	0.978651	0.968093
FYGT3	0.961225	0.972844	0.984692	0.997767	1.000000	0.995622	0.989403	0.981307
FYGT5	0.938329	0.951266	0.966859	0.987892	0.995622	1.000000	0.998435	0.994569
FYGT7	0.922041	0.935603	0.953130	0.978651	0.989403	0.998435	1.000000	0.998493
FYGT10	0.906564	0.920542	0.939686	0.968093	0.981307	0.994569	0.998493	1.000000

Symbols	AAPL	AMAT	AMZN	BAC	C	CSCO	GOOG	HPQ	IBM	INFY	...	LOGI	MSFT	ORCL	PEP	PG	QCOM	SAP	VMW	WIT	XRX
Date
2010-01-04	26.681330	11.730428	133.899994	14.269994	31.601511	19.399225	312.204773	18.829105	98.142967	5.201698	...	13.426306	24.525019	21.993277	45.880486	44.901073	36.391930	39.560089	36.194626	5.393488	17.708967
2010-01-05	26.727465	11.640190	134.690002	14.733838	32.809803	19.312796	310.829926	18.908081	96.957375	5.221861	...	13.320825	24.532942	21.966726	46.434902	44.915749	37.268017	39.316563	37.590240	5.550858	17.729486
2010-01-06	26.302330	11.615584	132.250000	14.906644	33.832211	19.187077	302.994293	18.732172	96.327530	5.142131	...	13.350961	24.382378	21.648108	45.970390	44.702705	36.903618	40.231876	37.939144	5.484095	17.565325
2010-01-07	26.253704	11.492537	130.000000	15.397775	33.925167	19.273508	295.940735	18.739347	95.994110	4.985420	...	13.147532	24.128809	21.577303	45.678207	44.460278	37.973530	41.289944	37.498863	5.279038	17.647409
2010-01-08	26.428249	11.935507	133.520004	15.261347	33.367481	19.375648	299.885956	18.879360	96.957375	4.998249	...	13.720146	24.295214	21.842813	45.528366	44.401520	38.353413	41.600647	37.590240	5.324342	17.585846

	AAPL	AMAT	AMZN	BAC	C	CSCO	GOOG	HPQ	IBM	INFY	...	LOGI	MSFT	ORCL	PEP	PG	QCOM	SAP	VMW	WIT	XRX
1	0.001729	-0.007693	0.005900	0.032505	0.038235	-0.004455	-0.004404	0.004194	-0.012080	0.003876	...	-0.007856	0.000323	-0.001207	0.012084	0.000327	0.024074	-0.006156	0.038559	0.029178	0.001159
2	-0.015906	-0.002114	-0.018116	0.011728	0.031162	-0.006510	-0.025209	-0.009303	-0.006496	-0.015268	...	0.002262	-0.006137	-0.014505	-0.010004	-0.004743	-0.009778	0.023281	0.009282	-0.012028	-0.009259
3	-0.001849	-0.010593	-0.017013	0.032947	0.002748	0.004505	-0.023280	0.000383	-0.003461	-0.030476	...	-0.015237	-0.010400	-0.003271	-0.006356	-0.005423	0.028992	0.026299	-0.011605	-0.037391	0.004673
4	0.006648	0.038544	0.027077	-0.008860	-0.016439	0.005300	0.013331	0.007472	0.010035	0.002573	...	0.043553	0.006897	0.012305	-0.003280	-0.001322	0.010004	0.007525	0.002437	0.008582	-0.003489
5	-0.008821	0.021993	-0.024041	0.008939	0.011142	-0.002839	-0.001512	-0.003042	-0.010470	0.005317	...	0.015376	-0.012720	0.000405	-0.001152	-0.003971	-0.003639	0.022204	-0.009282	-0.006717	0.024504