Data Science: Theories, Models, Algorithms, and Analytics

Preface

I developed these class notes for my Machine Learning course, taught with both R and Python. It traces my evolution as a data scientist into redundancy, I expect I will be replaced by a machine soon! There is a lot of work remaining to be done on this, including adding many more citations, replacing figures, and making sure full attribution is provided for all referenced material. Please do not cite or distribute at this time. Several chapters remain to be added. I suspect I will be busy working on this for several years as it is impossible to keep up with the pace of progress in machine learning.

If you find these notes helpful, I also have related notes for other classes:

  1. Deep Learning
  2. Reinforcement Learning

Prologue

“The future is already here; it’s just not very evenly distributed.” – William Gibson

“The public is more familiar with bad design than good design. It is, in effect, conditioned to prefer bad design, because that is what it lives with. The new becomes threatening, the old reassuring.” – Paul Rand

“It seems that perfection is attained not when there is nothing left to add, but when there is nothing more to remove.” – Antoine de Saint-Exupery

“In God we trust, all others bring data.” – William Edwards Deming

License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this book except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “as is” basis, without warranties or conditions of any kind, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Acknowledgements

I am extremely grateful to the following friends, students, and readers (mutually non-exclusive) who offered me feedback on these chapters. I am most grateful to John Heineke for his constant feedback and continuous encouragement. All the following students made helpful suggestions on the manuscript: Sumit Agarwal, Kevin Aguilar, Sankalp Bansal, Sivan Bershan, Ali Burney, Monalisa Chati, Jian-Wei Cheng, Chris Gadek, Karl Hennig, Pochang Hsu, Justin Ishikawa, Ravi Jagannathan, Alice Yehjin Jun, Seoyoung Kim, Bhushan Kothari, Ram Kumar, Federico Morales, Antonio Piccolboni, Shaharyar Shaikh, Jean-Marc Soumet, Rakesh Sountharajan, Greg Tseng, Dan Wong, Jeffrey Woo.