Machine Learning

Sanjiv Ranjan Das; Copyright 2019

1_DSTMAA

(NB HTML) | Preface | Prologue | License | Acknowledgements

2_DataScience

(NB HTML) | Volume, Velocity, Variety | Machine Learning | Supervised and Unsupervised Learning | Feature Selection | Ensemble Learning | Predictions and Forecasts | Innovation and Experimentation | The Dark Side: Big Errors | The Dark Side: Privacy | The Dark Side: Profiling | Data Brokers, or how the Cookie crumbles | Theories, Models, Intuition, Causality, Prediction, Correlation

3_MachineLearningOverview

(NB HTML) | Jupyter Extensions | 5 Applications of ML in Finance | Many Applications of ML in Finance | J.P. Morgan Guide to ML in Finance | ML Tasks in Finance | ML with NLP | scikit-learn: Python's one-stop shop for ML | Supervised Learning Models | Unsupervised Learning Models | Clustering | Dimension Reduction | Ensemble Methods | Small Business Association (SBA) Loans Dataset | Feature Engineering | Logistic Regression (Logit) | Limited Dependent Variables | The Logistic Function | Training, validation, and testing data: Underfitting, Overfitting, Cross-Validation | Metrics | More Metrics | The Matthews Correlation Coefficient | ROC and AUC | All In One | ML Comic

4_MathPrimer

(NB HTML) | Logarithms and Exponentials, Continuous Compounding | Calculus | Normal Distribution | Poisson Distribution | Moments of Random Variables | Combining Random Variables | Vector Algebra | Basic Regression Model (OLS) | Regression with Dummy Variables | Matrix Calculations | Diversification of a Portfolio | Diversification exercise | Matrix Equations | Matrix algebra exercise | Matrix Calculus | More exercises | Complex Numbers and Euler's Equation | Algorithm Complexity, Big-O notation

5_IntroductoryRprogramming

(NB HTML) | Got R? | System Commands | Loading Data | Getting External Stock Data | Data Frames | Higher-Order Moments | Reading space delimited files | Pipes with *magrittr* | Matrices | Root Finding | Regression | P-Values, t-statistics | Parts of a regression | Bias in regression coefficients | Heteroskedasticity | Auto-Regressive Models | Maximum Likelihood | Logit | Probit | ARCH and GARCH | Vector Autoregression | Solving Non-Linear Equations | Web-Enabling R Functions | Causal Inference | Top 10 Coding Mistakes made by Data Scientists

6_Linear_Regression

(NB HTML) | Predict the guaranteed percentage as a function of the other variables | Run the regression using scikit-learn | Run the regression using statsmodels

7_LogisticRegression

(NB HTML) | Limited Dependent Variables | The Logistic Function | Odds Ratio | Odds Ratio Coefficients | Metrics | AUC of the ROC curve | More Metrics | Using R | Multinomial Logit

8_MoreDataHandling

(NB HTML) | Getting started | Data Extraction of stocks using the *quantmod* package | Print the length of each stock series. | Convert closing adjusted prices of all stocks into individual data.frames. | Using the *merge* function | Make a single data frame | Plot the stock series | Convert the data into returns | Descriptive statistics | Correlation matrix | Correlogram | Market regression | Return versus systematic risk | Extracting online corporate data | Get all stock symbols from exchanges | Using the DT package | Web scraping | Using the *apply* class of functions | Getting interest rate data from FRED | Using the custom function | Organize the data by date | Handling missing values | Cross-Sectional Data (an example) | Access file from the web using the *readLines* function | Or, read the file from CSV | Failed banks by State | Check for missing data | Sort by State | Use the *aggregate* function (for subtotals) | Data by acquiring bank | Handling dates with *lubridate* | By Month | By Day | Using the *data.table* package | California Crime Statistics | Examine the *data.table* | Indexing the *data.table* | Plotting from the *data.table* | By County | Barplot of crime | Bay Area Bike Share data | Summarize Trips Data | Start and End Bike Stations | The *plyr* package family | Filter the data | Sorting using the *arrange* function | Reverse order sort | Descriptive statistics | Other functions in *dplyr* | Application to IPO Data | Bokeh plots

9_Shiny

(NB HTML) | The Black-Scholes-Merton (1973) model | The application program | SERVER | UI | Run | Running the App | Server section of the App | UI section of the App | Using the *reactive* mode in the app | UI | SERVER | Market Liquidity in Real Time using *Shiny* | Program files | Read in the URL for the ETF ticker | Process page for Pevious Closing Price | Process page for NAV | Compute BILLIQ | Process page for Yield | Output | Read in the URL for the ETF ticker | Process page for Pevious Closing Price | Process page for NAV | Compute BILLIQ | Process page for Yield | Output | Using *Shiny* with Data Table

10_Widgets

(NB HTML) | INTERACTIVE WIDGETS IN PYTHON | ipywidgets | Using widgets for Black-Scholes option pricing | Datatables html widgets

11_BayesModels

(NB HTML) | Bayes' Theorem | Example: Aids Testing | Computational Approach using Sets | A Second Opinion | Correlated Default | Indicator Functions for Default | Default Correlation | Continuous Space Bayes Theorem | Formulation | Posterior Distribution | Example | General Formula for $n$ sequential updates | Bayes Classifier | Posterior | Example | Bayes Classifier in sklearn | NCAA Dataset | Bayes Nets | Conditional Probability - 1 | Computational set-theoretic approach | Conditional Probability - 2 | R Packages for Bayes Nets | Bayes in Marketing | Product Launch? | Pilot Test | Other Marketing Applications

12_Naive_Bayes

(NB HTML) | What is Naive Bayes? | NCAA Dataset | Credit Card Dataset

13_SVM

(NB HTML) | What is a SVM? | Hyperplane Geometry | Regularization | NCAA Dataset

14_kNN

(NB HTML) | What is kNN? | Classified Neighborhoods | NCAA Dataset | Credit Card Dataset

15_Decision_Trees

(NB HTML) | Prediction Trees | Recursive Partitioning | C4.5 Classifier | NCAA Dataset | Gini coefficient | Credit Card Dataset | Explainability and Sensitivity

16_RandomForest

(NB HTML) | Kaggle's Credit Card Fraud Dataset - RF | Quick Class counts | Mean Amount in Each class | Under/over-sample with SMOTE ENN to overcome class imbalance | Different types of Re-sampling methods | How does SMOTE work? | Keep original training data before SMOTE | Train & Predict | Evaluate predictions | Accuracy | SciKitLearn's classification report gives us a more complete picture. | ROC Curve & AUC | Confusion Matrix | Logistic Regression Reprise (after oversampling)

17_DiscriminantFactorAnalysis

(NB HTML) | Introduction | Discriminant Analysis | Notation and assumptions | Discriminant Function | How good is the discriminant function? | Confusion Matrix | Example Using Basketball Data | Confusion Matrix | Explanation of LDA | Fischer's Discriminant | Generalizing number of groups | Eigen Systems | Intuition | Determinants | Dimension Reduction: Factor Analysis and PCA | Notation | The Idea | Principal Components Analysis (PCA) | Difference between PCA and LDA | Application to Treasury Yield Curves | Results | Difference between PCA and FA | Factor Rotation | Using the factor analysis function | Dimension Reduction in Python | NCAA dataset | A Matrix Reduction | Where does matrix L come from? | Principal Components Analysis (PCA) | Treasury Rates Dataset | Clearly, all interest rates are driven by one major component! | Inverted Yield Curve

18_Dimension_Reduction

(NB HTML) | NCAA dataset | A Matrix Reduction | Where does matrix L come from? | Principal Components Analysis (PCA) | Treasury Rates Dataset | Clearly, all interest rates are driven by one major component!

19_Discriminant_Analysis

(NB HTML) | Credit Card Dataset | Linear Discriminant Analysis | Discriminant Function | NCAA Dataset

20_LimitedDependentVariables

(NB HTML) | Maximum-Likelihood Estimation (MLE) | Likelihood | Implementation | Logit and Probit Models | Relation to classifiers | Logit | Example | Probit | Analysis | Odds Ratio and Slopes (Coefficients) in a Logit | Calculus of the logit coefficients | How about the Probit model? | Maximum-Likelihood Estimation (MLE) of these Choice Models | Multinomial Logit | When OLS fails | Truncated Variables and Sample Selection | Inverse Mills Ratio | Example: Limited Dependent Variables in VC Syndications | Sample Selection Problems (and endogeneity) | Example: Women in the Labor Market | Example using binary outcome for selection model. | We estimate the probability of womens' education on their | chances to get high wage (> $5/hr in 1975 USD), using PSID data | We use education as explanatory variable | and add age, kids, and non-work income as exclusion restrictions. | Endogeity: Some Theory to Wrap Up | Cox Proportional Hazards Model | GLMNET: Lasso and Ridge Regressions | Prediction on test data | ROC Curves | Glmnet Cox Models

21_ClusterAnalysis_Boosting

(NB HTML) | Overview | k-means | Nearness / distance metrics | Example: Randomly generated data in k-means | random starts do help here with too many clusters | Example: NCAA teams | Hierarchical Clustering | k Nearest Neighbors | Prediction Trees | Fitting the tree | Classification Trees | C4.5 Classifier | Regression Trees | Example: Califonia Home Data | Random Forests | Top Ten Algorithms in Data Science | Boosting

22_Clustering

(NB HTML) | Overview | SBA dataset | Clustering on PCA reduced data | Hierarchical Clustering (bottom up) | NCAA dataset | Redo Hierarchical Clustering in R

23_TextAnalytics

(NB HTML) | News Analysis | News Analytics | Text as Data | Definition: Text-Mining | Algorithm Complexity | The Response to News | Breakdown of news flow | Frequency of posting | Weekly posting | Intraday posting | Number of characters per posting | Examples: Basic Text Handling | Using List Comprehensions to find specific words | String operations | Read in a URL | Reading in a URL | Use Beautiful Soup to clean up all the html stuff | Dictionaries | Lexicons | Constructing a lexicon | Lexicons as Word Lists | Negation Tagging | Scoring Text | Read in a dictionary | Read in a file | Here we will read in an entire dictionary from Harvard Inquirer | Sentiment Score the Text using this Dictionary from Harvard Inquirer | General Function to Pull Financial Text and score it | Parts of Speech (POS) Tagging | Twitter API | Using FastText to analyze Tweets | JSON | Using the NLTK package to conduct sentiment analysis without a dictionary | Extracting tweets with a hashtag | News Extractor: Reading in parts of a URL | Here we will read in an entire dictionary from Harvard Inquirer | Remove punctuation from headlines | Remove Numbers | Stemming | Remove Stopwords | Write all docs to separate text files | Create a Corpus | Term Document Matrix | Term Frequency - Inverse Document Frequency (TF-IDF) | WordClouds | Cosine Similarity in the Text Domain | Readability of Text | Gunning-Fog Index | Flesch Score | The Flesch-Kincaid Grade Level | koRpus package | Text Summarization | Generating the summary | One Function to Rule All Text in R | Example: Summarization | Modern Methods | Topic Modeling using LDA | LDA Explained (Briefly) | Matrix $A$: Connecting Documents with Topics | Matrix $B$: Connecting Words with Topics | Distribution of Topics in a Document | Distribution of Words and Topics for a Document | Likelihood of the entire Corpus | Word2Vec: Word Embeddings | Skip-gram | GloVe (Global Vectors) | word2vec fitting with neural nets | Doc2Vec | RegTech | Research in Finance

24_TextAnaytics_Advanced

(NB HTML) | Basic Textual Data | Basic Text Cleanup | Lemmatization | Non-negative Matrix Factorization (NMF) | Iterative Solution for NMF | NMF by gradient descent | Singular Value Decomposition (SVD) | Latent Semantic Analysis (LSA) | How is LSA implemented using SVD? | Example in R | LSA and Singular Value Decomposition (SVD) | Dimension reduction of the TDM via LSA | LSA and SVD: the connection? | What is the rank of the TDM? | Classification and Word Embeddings using text2vec in R | Preprocessing and tokenization | Iterate and Vectorize | Document Term Matrix (DTM) | N-Grams | TF-IDF | Refit classifier | Now we take the TF-IDF adjusted DTM and run the classifier. | Word Embeddings | GloVe | word2vec explained | Topic Analysis using text2vec | Entity Extraction | Using spaCy | Stochastic Network Embeddings (t-SNE) | Reuter's news corpus for t-SNE | Knowledge Graphs | Neural Text Generation | Text Classification with Neural Nets | Linguistic Markers

25_Networks

(NB HTML) | Networks are beautiful | Small Worlds | Graphs | Types of graphs | Adjacency Matrix | igraph package | Graph Attributes | Dijkstra's Shortest Paths Algorithm | D3 plots | Centrality | Betweenness | Communities | Using NetworkX in Python | Undirected graphs | Directed Graphs | Degree, Eigenvalue, Betweenness Centrality | Financial Applications | Risk Networks | Example | Adjacency Matrix | Overall Risk Score | Risk Decomposition | Centrality | Risk Increment | Criticality | Cross Risk | Risk Scaling: Spillovers | Risk Scaling with Increased Connectivity | Too Big To Fail | Systemic Risk in Indian Banks | Systemic Risk Portals | Shiny application

26_Recommenders

(NB HTML) | Introduction | Alternating Least Squares | Solve $u$ matrix | Solve $m$ matrix | ALS package | Interpretation and Use | ALS in Python

27_ProductForecastingBassModel

(NB HTML) | The Death of Mathematics | Main Ideas | Historical Examples | The Basic Idea | Main Differential Equation | Solving the Model for $F(t)$ | Another solution | Solve for $f(t)$ | Example | Symbolic Math in R | Solution using Wolfram Alpha | Calibration | iPhone Sales Forecast | Comparison to other products | Sales Peak | Samsung Galaxy Phone Sales | Global Semiconductor Sales | Extensions | Trading off $p$ vs $q$

28_DeepLearning_Introduction

(NB HTML) | Sanjiv R. Das | Professor of Finance and Data Science | Santa Clara University | Net input | Examples of Different Types of Neurons | Sigmoid | ReLU (restricted linear unit) | TanH (hyperbolic tangent) | Output Layer | More on Cross Entropy | Entropy | Kullback-Leibler Divergence | SoftMax Function | Delta of Softmax | Batch Stochastic Gradient | Fitting the NN | Gradient Descent Example | Vanishing Gradients | Gradients in Multiple Dimensions | Saddle Points | Effect of the Learning Rate | Annealing | Learning Rate Algorithms | Momentum | Behavior of Momentum Parameter | Properties of the Momentum algorithm | Nesterov Momentum | The ADAGRAD Algorithm | The RMSPROP Algorithm | The ADAM Algorithm | Back to Activation Functions (Vanishing Gradients) | tanh Activation | ReLU Activation | Dead ReLU Problem | Leaky ReLU | PreLU | Maxout | Initializing Weights | Data Preprocessing | Zero-Centering | Zero Centering Helps | Batch Normalization | Under and Over-fitting | Regularization | Early Stopping | L2 Regularization | L1 Regularization | Dropout regularization | Bagging (Ensemble Learning) | TensorFlow Playground | Pattern Recognition: Cancer | Read in the data set | One-Hot Encoding | Convert the class variable into binary numeric | Make label data have 1-shape, 1=malignant | Keras to define DLN | Define the neural net and compile it | Train the Model | Fit/train the model (x,y need to be matrices) | Accuracy | Confusion matrix | Read in the data set | Reformat the data | Define the neural net and compile it | Fit/train the model (x,y need to be matrices) | In Sample | Confusion matrix | | Out of Sample | Confusion matrix | | Image Processing, Transfer Learning | Relevant Applications | Causal Models | THE END

29_Fourier

(NB HTML) | Introduction | Fourier Series | Unit Circle | Angular velocity | Fourier series | Radians | Solving for the coefficients | Complex Algebra | Trig to Complex | Getting rid of $a_0$ | Collapsing and Simplifying | Fourier Transform | Empirical Example | Application to Binomial Option Pricing | Application to probability functions | Characteristic functions | Finance application | Solving for the characteristic function | Computing the moments | Probability density function

30_FinanceModels

(NB HTML) | Brownian Motions: Quick Introduction | Monte Carlo Simulation | Vectorization | Bivariate random variables | Multivariate random variables | Portfolio Computations | Optimal Portfolio | What if we reduced beta? | Interest-rate processes (O-U process) | Cox-Ingersoll-Ross (CIR) model | Estimating the historical parameters for interest rates | Download data and run interest-rate estimation | Black-Scholes Recap | Option Pricing by Simulation | Antithetic Variance Reduction | Control Variate Technique (CVT) | American Options: Simulation with Optimization | American Options: Binomial Trees | Longstaff-Scwartz Least Squares Model

31_PortfolioOptimization

(NB HTML) | Diversification of a portfolio | Markowitz Portfolio Problem | The Solution by Lagrange Multipliers | Optimization | Notes on the solution | The Optimal Portfolio Function | Example | A different expected return | Numerical Optimization with Constraints | The Efficient Frontier | Covariances of frontier portfolios | Combinations | Exercise | Portfolio problem with riskless assets | Example

32_DigitalPortfolios

(NB HTML) | Digital Assets | Modeling Digital Portfolios | Fast Computation Approach | Combining conditional distributions | Stochastic Dominance (SD) | Portfolio Characteristics | How many assets? | The impact of correlation | Uneven bets? | Mixing safe and risky assets | Conclusions

33_Gambling

(NB HTML) | Introduction | Odds | Edge | Bookmakers | The Kelly Criterion | Simulation of the betting strategy | Half-Kelly | Deriving the Kelly Criterion | Entropy | Linking the Kelly Criterion to Entropy | Linking the Kelly criterion to portfolio optimization | Implementing day trading | Casino Games

34_Auctions

(NB HTML) | Introduction | Overview | Auction types | Value Determination | Bidder Types | Benchmark Model (BM) | Properties of the BM | Auction Math and Stats: Seller's Expected Revenue | Optimization by bidders | Example | Treasury Auctions | DPA or UPA? | Collusion | Web Advertising Auctions | Quick tutorial on Google Ad Auctions | Next Price Auctions | Laddered Auction | Remaining questions to ponder

35_Data_Tech

(NB HTML) | Virtualization and Containers | Download