Applied Mathematics II: Statistical Learning

Description: This course will cover statistical and machine learning theory using foundational approaches (i.e. not neural-network based, as these will be the focus of the third course in the Applied Math sequence). These will include probability theory, regression, classification, kernel methods, mixture models and expectation maximization, as well as inference for sequential data using hidden Markov models and linear dynamical systems. Homework assignments will be a mix of pen-and-paper calculations together with implementations and applications of machine learning algorithms to real and synthetic data using Python. The main prerequisites for this course are calculus and linear algebra. Prior familiarity with probability and statistics or with coding will be helpful but is not necessary.

By the end of this course, students will have obtained the following skills:

  • Approach any problem involving data from the perspective of probabilistic inference, using statistical thinking to quantify patterns and relationships in data.
  • Use statistical thinking to quantify the degree of confidence in conclusions that are drawn about patterns and relationships in data.
  • Derive the equations underlying standard machine-learning algorithms mathematically.
  • Implement standard machine-learning algorithms from scratch in Python.
  • Make use of standard implementations of machine-learning algorithms in Python.
  • Generate fake data that contains some interesting structure, then use standard machine-learning algorithms to discover the structure in this fake data.
  • Apply standard machine-learning algorithms to real data to quantify the patterns and relationships that it contains.

Prerequisites

Familiarity with probability and random variables at an undergraduate level, experience with python programming, calculus, linear algebra. Completion of the first course in this sequence will be sufficient.

Schedule overview:

Tentative list of topics:

  • Linear predictors (Ch. 3)
  • Classification (Ch. 4)
  • Kernel methods (Ch. 6)
  • Mixture models and the EM algorithm (Ch. 9)
  • Sequential data (Ch. 13)

Additional possible topics (if time permits):

  • Support vector machines (Ch. 7)
  • Approximate inference (Ch. 10)
  • Sampling methods (Ch. 11)
  • Continuous latent variables (Ch. 12)

Homework and assessment:

Homework assignments can be found here: https://github.com/murray-lab/statistical-learning.

Weekly homeworks will consist of a mix of pen-and-paper problems and computational exercises. Computational homework will be turned in as jupyter notebooks, allowing integration of programming, simulation results, and LaTeX into a single document. Homework assignments should be turned in via the course’s Canvas page.

Software

  • A recent version of Python
  • A working installation of Jupyter
  • A good code editor like Atom

Books

The course will follow Pattern Recognition and Machine Learning, by Bishop. The book is available for free on the author’s website.

Inclusion and accessibility

Please tell us your preferred pronouns and/or name, especially if it differs from the class roster. We take seriously our responsibility to create inclusive learning environments. Please notify us if there are aspects of the instruction or design of this course that result in barriers to your participation! You are also encouraged to contact the Accessible Education Center in 164 Oregon Hall at 541-346-1155 or uoaec@uoregon.edu.

We are committed to making our classroom an inclusive and respectful learning space. Being respectful includes using preferred pronouns for your classmates. Your classmates come from a diverse set of backgrounds and experiences; please avoid assumptions or stereotypes, and aim for inclusivity. Let us know if there are classroom dynamics that impede your (or someone else’s) full engagement.

Please see this page for more information on campus resources, academic integrity, discrimination, and harassment (and reporting of it).