# Deep Learning

## Subir Varma & Sanjiv Ranjan Das; Notes 2019, 2020, 2022

### Introduction

(NB HTML) | Deep Learning Applications | What is Deep Learning? | Why are DLNs so Effective | Classification of Deep Learning Systems | Supervised Learning | Self Supervised Learning | Un-Supervised Learning | Reinforcement Learning | Historical Perspective: The Perceptron | Neural Net Number of Research Reports | Slides and Additional Reading

### PatternRecognition

(NB HTML) | MNIST | Epoch Accuracy | ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) | Keras Implementation of the MNIST Classifier

### LinearAlgebra_Gradients_Optimization

(NB HTML) | Paradigm | Scalars | Vectors | Matrices | Tensors | Dot Product | Norm of a vector | Euclidian Distance | Cosine of the angle between vectors | Matrix Products | Reshaping tensors | Transpose | Rank | Inverse by hand | Inverse in higher dimension | Applications of the Inverse | Basis | Span | Linear Transformations | Rotation of Axes | Shear | Determinants | Singular Matrix | Determinants in Probability Functions | Portfolio Calculations using Vectors and Matrices | Mean Return | Covariance and Correlation | Covariance of stock returns | Independent Returns | Dependent Returns | Diversification in a stock portfolio | Computing the portfolio standard deviation | as n increasesÂ® | Plot risk as n increases | Do we still get diversification if the weights are not 1/n? | Multivariate Linear Functions | Quadratic Functions | Vector Calculus | Minimize the function above | Gradient Descent | Euler's Homogenous Function Theorem | Markowitz's Nobel Prize | SOLUTION:

### LinearAlgebra_Eigensystems_Decompositions

(NB HTML) | Definition | Eigenvalues and eigenvectors | Solving for eigenvectors and eigenvalues | No eigenvectors? | New Discovery between Eigenvalues and Eigenvectors | Eigenbasis | Application of eigen decomposition | Application to Social Networks | Positive definite matrices | Singular Value Decomposition (SVD) | Redo Treasury rates with SVD | Components of SVD | LU Decomposition | Cholesky Decomposition | Cholesky in Monte Carlo

### SupervisedLearning

(NB HTML) | Statistical Classification Formulation | The Cross Entropy Loss Function | The Regression Problem | Summary | References and Slides

### LinearLearningModels

(NB HTML) | Logistic Regression | Sigmoid Function | Linear Separator | Gradient Descent | Stochastic Gradient Descent | Stochastic Gradient Descent Algorithm (SGD) | Batch Stochastic Gradient Descent Algorithm (B-SGD) | Multiclass Logistic Regression | Gradient Descent Algorithm | Image Classification Using Linear Models | Beyond Linear Models | Example of a Linear Model in Keras | References and Slides

### NNDeepLearning

(NB HTML) | Non-Linear Filters | Dense Feed Forward Networks | Nodes vs Layers | Performance as a function of layers | Example of a Dense Feed Forward Network in Keras | Models Using the Keras Layers Module | Models Using the Keras Functional API | Ingesting Data into Keras Models | Ingesting Tabular Data: Predicting a Missing Row Element | Ingesting Tabular Data: Time Series Analysis | Ingesting Text Data | Ingesting Image Data | References and Slides

### TrainingNNsBackprop

(NB HTML) | Vast Scale of Gradient Descent in DLNs | Optimization Problem | Gradient Descent | The Chain Rule of Derivatives | Gradient Flow Calculus | Backprop | The Backwards Pass | The Complete Training Algorithm | Issues with Backprop | Why does Gradient Descent work so well for DLNs? | References and Slides

### GradientDescentTechniques

(NB HTML) | Backprop recap | Issues with Gradient Descent | Learning Rate Annealing | Improvements to the Parameter Update Equation | Momentum | Nesterov Momentum | The ADAGRAD Algorithm | The RMSPROP Algorithm | The Adam Algorithm | Specifying Optimizers in Keras | Choice of Activation Functions | The tanh Function | The ReLU Function | The Leaky ReLU and the PreLU Functions | The MaxOut Function | Specifying Activation Functions in Keras | Initializing the Weight Parameters | Data Preprocessing | Batch Normalization | Power of Batch Normalization | Layer Normalization | The Vanishing Gradient Problem | References and Slides

### ImprovingModelGeneralization

(NB HTML) | Generalization | Underfitting and Overfitting | Model Capacity | The Validation Dataset | Detecting Underfitting | Detecting Overfitting | Regularization | Early Stopping | L2 Regularization | L1 Regularization | Dropout Regularization | Weight Adjustments in Dropout | Batch Stochastic Gradient for Dense Feed Forward Networks with Dropout | Effectiveness of Dropout | Training Data Augmentation | Model Averaging | Summary of Regularization | References and Slides

### HyperParameterSelection

(NB HTML) | Hyperparameters | Choosing the Model | Choosing the Algorithms | How much Data is Needed? | Tuning Hyper-Parameters | Manual Tuning | Automated Tuning | Verifying Code Correctness | References and Slides

### DeepLearningWithR

(NB HTML) | Breast Cancer Data Set | The *deepnet* package | The *neuralnet* package | Using H2O | Image Recognition | Import MNIST CSV as H2O | Using MXNET | Auto detect layout of input matrix, use rowmajor.. | Import MNIST CSV | Auto detect layout of input matrix, use rowmajor.. | Using TensorFlow | Detecting Cancer | Set up and compile the model | Fit the deep learning net | The Hello World of Deep Learning (MNIST) | Normalization | Construct the Deep Learning Net | n_units = 100 tf example is 512, acc=95%, with 100, acc=96% | Compilation | Fit the Model | Quality of Fit | Using TensorFlow with *keras* (instead of *kerasR*)

### DeepLearningWithPython

(NB HTML) | Cancer Data | OUT-SAMPLE ACCURACY | MNIST Data | Option Pricing | Normalization | Normalize the data exploiting the fact that the BS Model is linear homogenous in S,K | Custom Functions in the DLN | Model Accuracy | CIFAR-10 dataset

### ConvNetsPart1

(NB HTML) | Introduction | Why Dense Feed Forward Networks don't work well for Images | Architecture of ConvNets | Fully Connected DLNs vs ConvNets | Pooling | Global Max Pooling | A Complete ConvNet | Sizing ConvNets | Sizing the Convolutional Layer | Sizing the Pooling Layer | Computations during Convolutions | One Dimensional ConvNets | Backprop for ConvNets | Transfer Learning with ConvNets | References and Slides

### ConvNetsPart2

(NB HTML) | Introduction | Trends in ConvNet Design | Residual Connections | Small Filters in ConvNets | Bottlenecking using 1x1 Filters | Grouped Convolutions | Depthwise Separable Convolutions | ConvNet Architectures | LeNet5 (1998) | AlexNet (2012) | ZFNet (2013) | VGGNet (2014) | Google Inception Network (2014) | ResNet (2015) | Beyond ResNets: ResNext and DenseNet | Xception | MobileNet | Visualizing ConvNets | Visualizing the Proximity Property of Feature Space Representations | Visualizing Local Filters | Visualizing Activation Maps | Identifying Maximally Activating Patches in Images | Generating Images | Generating Images that Maximally Activate a Neuron | Adversarial Images | Generating Images Using Google Deap Dream | Generating Images Using Feature Inversion | Other Image Processing Tasks | Object Localization | Semantic Segmentation | Object Detection | References and Slides

### RNNs

(NB HTML) | Introduction | What are Recurrent Neural Networks | IMBD Movie Review Classification Using an RNN | Examples of RNN Architectures | RNNs with Stacked Layers | Bi-Directional RNNs | RNNs with Dropout | Training RNNs: The Back Propagation Through Time (BPTT) Algorithm | BPTT Forward Pass | BPTT Backward Pass | Truncated Back Propagation through Time | Difficulties with Backprop in RNNs | Solutions to the Vanishing and Exploding Gradients Problems | Long Short Term Memories (LSTMs) | Gated Recurrent Units (GRUs) | LSTMs in Keras

### NLP

(NB HTML) | Introduction | Word Embeddings | Text Classification | Language Models | Next Word Generation | Example of a Character based Language Model | Conditional Language Models | Neural Machine Translation | Image Captioning | The Attention Mechanism | Image Captioning with Attention | Speech Transcription with Attention

### Transformers

(NB HTML) | Introduction | Why RNNs are not Good Enough | Introducing Self Attention | Transformer Architecture | Multiple Attention Heads | A Complete Transformer Block | Keras Model for a Transformer Encoder | Encoding Position Information in Transformers | Visualizing Attention Patterns in Transformers | Relationship between Transformers and Depthwise Separable ConvNets | Language Models using Transformers | Text Completion using Language Models | Summarization using Transformer based Language Models | Encoder-Decoders using Transformers | BERT: Bi-Directional Language Models | Image Processing using Transformers | References and Slides

### AutoEncoders

(NB HTML) | MNIST Example | Encoder | Decoder | Compile and Fit the Autoencoder | Deep AutoEncoder | Dimension Reduction on Treasury Rates | Equity Returns

### GenerativeAdversarialNetworks

(NB HTML) | Overview and Intuition | References

### GenerativeModels

(NB HTML) | Introduction | Overview | Latent Variables and the ELBO Bound | Forward Diffusion Process | Reverse Diffusion Process | Optimization Objective | The DDPM Algorithm | The Neural Network | Speeding Up Diffusion Models: The DDIM Algorithm | DDIM Accelerated Sampling Process | The Latent Diffusion Model (LDM) | Conditional Diffusion Models | Appendix: Multivariate Gaussian Distributions

### ReinforcementLearning

(NB HTML) | Introduction | Dynamic Programming | Card Game | Random Policy Generation | Policy Gradient Search | Q-Learning | Implementation in Python | State-Action Reward and $Q$ Tensors | Q-Learning Algorithm | Running the algorithm | Convergence | Compute the optimal policy grid from the Q grid | Compute expected value function for random sets | Q-learning with Deep Learning Nets vs Dynamic Programming | References

### Glossary_Data

(NB HTML) | Quick Reference | Data for all the code in the book