Spring 2021
MIT: 12.S592 U/G MIT REGISTRATION CANVAS
Lec: F. 911 : Zoom
Instructor: Sai Ravela (ravela@mit.edu)
Who Benefits  About Us  Foundations  Guest Lectures  Tutorials  PSETS  Participation

Dynamics, Optimization, and Information foundations of Machine Learning

MLbased System Dynamics and Optimization for Earth, Planets, Climate, and Life

Flexible participation model

Prereqs: Linear Algebra, Probability & Stat, a first course in ML, and some Systems preparation (at least one of Statistical Signal Processing, Estimation, Control or Optimization)
Motivation
Despite numerous toolkits and excellently googled introductory tutorials, useful problemsolving is often stymied beyond the initial "copypaste" application of Machine Learning. For example, does model selection leave you dissatisfied? Does finding great features seem difficult? Do you wish learning was more efficient or at least the machines better designed? Do you wonder what general Information principles underpin learning, or how to learn from theory and data? If you answered yes to any of these questions, MLSDO is here to help. The course 12.S592 (MLSDO) explores machine learning from a novel and rigorous systems dynamics and optimization perspective. This allows you to understand the strengths and weaknesses, and confidently consider learning machines for your work.
The application of machine learning to science is a central theme. We are especially interested in Earth, Planets, Climate and Life, but engineering, materials, media, and policies are close relatives. Learning is playing an increasing role in these problems. It can be useful to quantify uncertainty, risk, and represent physics parts difficult to numerically model. ML can even discover dynamical equations from data! However, in general, ML must often contend with the extraordinary skill of theory, principles, laws, and knowledge :) ML is a relative novice  even simple physical principles are challenging to learn from data, which practical machines often explicitly encode. Therefore, building hybrid systems part theory part datadriven is a crucial question that we make some inroads into. To put it in the picture, this is the world this course lives in:
The Course is centered around investigations of dynamics of Learning, its optimization and embedded in dynamic data driven application systems
Here is a snapshot of various aspects of the course
Who Benefits?
This course is most beneficial if you have gotten past writing “Hello World” ML applications, e.g., using MNIST or CIFAR. It is quite helpful when you are looking to develop methodology for realworld problems. This course is much fun for those who like to think from first principles and want to examine ML that way. Those with a Physics, Engineering or Mathematical preparation will particularly enjoy 12.S592. Participants from the industry seem to like this course.
About Us
MLSDO (and its predecessor) have been taught since 2013. The present course is taught by a Systems Scientist together with several Guest Lecturers including an Applied Mathematician, an AI architect from industry, and other experts who describe the successes, opportunities and challenges for ML in their fields. MLpracticing scientists, many a product of previous versions of this course, offer tutorials to help you become skillful and support your ML growth.
Structure and Content
MLSDO is an "infinite course," continuing each term as it round robins among topics. Please attend the first two lectures; the material is cast quickly! Typically we have the following components: Lectures (fundamentals), Guest Lectures (application), Tutorials ( methods), and PSETs/Projects (problemsolving). Here is the proposed set for this term:
Foundations: Dynamics of Learning
 Systems Dynamics and Optimization (SDO): First, we lose the "irrational exuberance."
 Predictive Systems and Discovery Machines to interact with the complex nonstationary realworld in a variety of physical applications.
 Theory, Models, Data, and the SDO cycle.
 How ML promises to improve efficacy and reduce model error.
 Model Error in Learning and its disastrous effects.
 Do we need theory to help Learning? The role that Conservation, Invariance, Symmetry, Smoothness, Sparsity, Feedbacks and other "theorydriven" constraints play on learning.
 What information principles help machines become efficacious? The notion of Information Gain.
 In an Ideal World: Using a lineargaussian world, we look at optimal estimation, developing basic ideas in organizing data into features, inference on graphs, and learning for information gain. We close the loop for sequential processing through recursive estimation. We examine how LinearGaussian world degrades in a nonlinear, nongaussian world with complex structure.
 Learning Dilemmas: We understand fundamentally how difficult it is to "learn well."
 Risk vs. Empirical Risk, Generalization vs. Extrapolation, Stability vs. Generalization, Consistency
 Localization, hyperparameters, Bias vs. Variance, Invariance vs. Selectivity, No Free Lunch and Universal Computation.
 Inference on Graphs:
 Variational Inference  twopoint boundary value problems
 Bayesian Inference  sampling, ensemble method and variational bayes
 Probabilistic Graphical Models with Model Selection.
 Kernels: We then explore data and features. Kernels (Symmetric Positive Definite forms), Graph Spectra, key kernel pplications, "Kernelization" of many ML algorithms.
 Measures and Information: We touch the very core by thinking together about similarity measures and how to optimize them in practice. We study
 pnorms, RKHS,
 Entropy, Mutual Information, and Quadratic Mutual Information
 Optimal Transport and Field Alignment.
 Regularization: We investigate the tussle between "fitting" and "constraining."
 Tikhonov, pnorm, Natural Statistics, Entropy, Randomization, Augmentation, Leaveout, Theoryguided.
 Smoothness vs. sparsity, Relationship to Bayes
 Tunable Sparsity.
 Concentration Inequalities with application to model selection.
 Dynamics of Deep Learning [32]:
 Dynamical systems associated with Learning
 TwoPoint Boundary Value Problems, Error Dynamics and Stability.
 Bayesian Parameter Uncertainty Estimation
 Stochastic formulation and Information Transfer Efficiency.
 Informative Learning [36]:
 Understand the notion of information gain, its unifying framework for data selection, parameter and structure selection, feature selection, and adaptation.
 Understand hardness of quantifying information gain
 Develop practical strategies such as
 sparse optimization, playing the Game of 20questions
 Learning for Information Gain applied to targeting data and parameter selection.
 Estimating Network Structure  Initialization and adaptation
 Neural Dynamical Systems [39]:
 Define and Understand Continuous and Discretetime Neural Dynamical Systems
 Develop Exact Networks and Bounds on Approximate Networks
 Develop Stable Learning Formulations for Hybrid Neural Dynamics
 Develop approaches to Uncertainty Quantification
 Relevance, Recommendation, and Reinforcement [Optional]: Many realworld applications with dynamical systems will require the learning machine to keep up with a changing world.
 Feedback mechanisms for incremental (and often online) learning using notions of information gain.
 Application to (some subset of) Search Engines, Recommender Systems, Stochastic Dynamic Programming, Control, Estimation, and Reinforcement Learning
The optional topics are ones that we may insert into the course by removing others from the list. We'll do this in the first two classes through a discussion.
Guest and Special Lectures [39]
 Sara Seager [Exoplanets]
 Dirk Smit [Seismic Inversion]
 Hao Sun [Learning Physics from Data]
 Brent Minchew [Glaciers]
Methods (These change from term to term). [1 each]
Starting in Fall 19, a section was introduced for methods. The MLpractitioners have rapidly grown and the community this course has fostered enables us to form an effective peer group to support your learning and growth.
 Trees, Hashes and Nearest Neighbors  Indexing Schemes Tutorial
 Prinicpal and Independent Components
 Sampling and Hamiltonian Monte Carlo
 Expectation Maximization with Application
 Gaussian Graphical Models
 Kernel Machines  GPR, KPCA, SVM and Manifolds x 2
 Convolutional Networks
 Long ShortTerm Memory
 AutoEncoders
 Generative Adversarial Networks
 Boosting and Bagging
PSETS: Application to Systems Dynamics and Optimization
The PSET problems highlight key topics of interest in learning dynamics, where both real data sets and idealized models may be used to communicate key ideas and potential for the role of learning. Three PSETS from the following topics are typically developed. The topics themselves are drawn from applications (see at the end of this page).
 Detection: Use theory and data to learn a very low SNR detector. Use learning to solve inverse problems more efficaciously [Level: Intermediate]
 Targetted Uncertainty Quantication: Learn reduced models for rapid, targetted uncertainty quantification [Level: Intermediate]
 Informative Learning: Automatically select parameters, data and model structure to maximize information gain while learning with continuous time neural networks [Level: Hard]
 Stable Parameterization: Learn to build hybrid physicalneural dynamical systems that integrate in a stable manner [Level: Easy]
 Interval Smoothing: Rapidly train a neural dynamical system from data [Level: Easy]
 Downscaling: Use Machine Learning to produce a superresolution method for fluid dynamics [Level: Hard]
 Oscillator Discovery: Use model simulations of fluids to detect lowamplitude oscillations using learning [Level: Intermediate]
 Longrange forecasting: Develop models to forecast extremes at long time ranges. [Level: Easy]
 From Theory to Data and Back: Use theory to initialize learning machines and adapt them with data to advance theory [Level: Intermediate]
 Learnability and Predictability: Explore the relationship between what is predictable and what is learnable [Level: Hard]
 Trouble Spotter: Develop algorithms for fast anomaly detection using Crowds [Level: Easy]
Participation
There are several ways to take this course:
 Casual Exposure: You may wish to attend one or more lectures, tutorials, or guestlectures. If you aren't taking this class for credit, please make detailed notes and submit them. Consider joining one of the other students who are doing a project and supporting them in meaningful ways.
 Entrylevel Participation: You want to learn specific methods. You must attend the Methods section and solve three PSETs or conduct a project. This is best for someone who wants to grow into a peer group of ML users; check out the growing community. (6Cr)To succeed in this course at the EntryLevel, you know a programming language, and some statistics and linear algebra.
 Advanced Academic Participation: You are interested in gaining an indepth understanding. You are well prepared either through courses or practice. You must attend the lectures and finish at least three problem sets. The methods portion can be highly useful but not mandatory. (9Cr)To achieve at the Advanced levels, you've had preparation in more than a third of the following subjects. Linear Algebra, Introductory Probability, and Statistics, Introductory ODE/PDEs, Numerical Methods, Estimation, Inference & Information, Control, Optimization, or Dynamics. An introductory Data Science or Machine Learning course is highly useful but not compulsory.
 Research Participation: You have a specific research project that needs ML. You attend the main lectures, methods, or both, and you present a paper based on your project and course material. It is highly advisable to talk to the instructor when selecting this option (3  12 Cr). To derive value at the Research level, you are examining ML in your research (e.g., for a generals project or thesis). It is highly recommended that you chat with the instructor in this case.
Applications
Data Assimilation, Autonomous Environmental Mapping, Model Reduction, Uncertainty Quantification, Sensor Planning, Prediction and Predictability, Planning for Risk Mitigation, Convective Superparameterization, RadiativeConvective Equilibrium, Nonlocal Operators, Teleconnections, Particle Detection and Sizing, Species Characterization, Paleoclimate, Event Detection and Tracking in X (Voclanoes, Earthquakes, Hurricanes, Tsunamis, Storms and Transits). Superresolution/Downscaling, Coherent Structures in Turbulence, Seismic Imaging and Geomorphology, Porus Media, Reservoirs, Exoplanets. However, other Engineering, Science and Finance applications may be included depending on participant interest. Whilst our motivation and reach is broad, we will drill down each term to a few core applications that is set by participant interest.
Books
The course uses a variety of material; books, notes, online material and code. You can expect each of these to be released before the start of the topic in class.
 C. M. Bishop, "Pattern Recognition and Machine Learning"
 T. Hastie et al. The Elements of Statistical Learning
 I. Goodfellow et al., Deep Learning
 Scholkopf and Smola Learning with Kernels
 S. Raschka, Python Machine Learning
 https://ocw.mit.edu/courses/mathematics/1806linearalgebraspring2010/

https://www.probabilitycourse.com/  for some stat we will need
Sai Ravela (ravela@mit.edu)