
In Spring 2023, a large number of previous participants are informally meeting. The course will resume again next term formally.

In Spring 2023, this course is being offered at Instituto Polytechnico National (IPN), Mexico.

Prior Offering: Fall 2022, MIT: 12.S592 U/G MIT REGISTRATION CANVAS, Lec: F. 911 Rm: 35308 and Zoom
Instructor: Sai Ravela (ravela@mit.edu)
Who Benefits  About Us  Foundations  Guest Lectures  Tutorials  PSETS  Participation

Dynamics, Optimization, and Information foundations of Machine Learning

MLbased System Dynamics and Optimization for Earth, Planets, Climate, and Life

Flexible participation model

Prereqs: Linear Algebra, Probability & Stat, a first course in ML, and some Systems preparation (at least one of Statistical Signal Processing, Estimation, Control or Optimization)
Motivation
Despite numerous toolkits and excellently googled introductory tutorials, useful problemsolving is often stymied beyond the initial "copypaste" application of Machine Learning. For example, does model selection procedures leave you dissatisfied? Does finding good features seem difficult? Do you wish learning was more efficient or at least the machines better designed? Do you wonder what general Information principles underpin learning, or how to learn from theory and data? If you answered yes to any of these questions, MLSDO is here to help. The course 12.S592 (MLSDO) explores machine learning from a novel and rigorous systems dynamics and optimization perspective. This allows you to understand the strengths and weaknesses, and confidently consider learning machines for your work.
The application of machine learning to science is a central theme. We are especially interested in Earth, Planets, Climate and Life, but engineering, materials, media, and policies are close relatives. Learning is playing an increasing role in these problems. It can be useful to quantify uncertainty, risk, and represent physics parts difficult to numerically model. ML can even discover dynamical equations from data! However, in general, ML must often contend with the extraordinary skill of theory, principles, laws, and knowledge :) ML is a relative novice  even simple physical principles are challenging to learn from data, which practical machines often explicitly encode. Therefore, building hybrid systems part theory part datadriven is a crucial question that we make some inroads into. To put it in the picture, this is the world this course lives in:
The Course is centered around investigations of dynamics of Learning, its optimization and embedded in dynamic data driven application systems
Here is a snapshot of some aspects of the course
Who Benefits?
This course is most beneficial if you have gotten past writing “Hello World” ML applications, e.g., using MNIST or CIFAR. It is quite helpful when you are looking to develop methodology for realworld problems. This course is much fun for those who like to think from first principles and want to examine ML that way. Those with a Physics, Engineering or Mathematical preparation will particularly enjoy 12.S592. Participants from the industry seem to like this course. You would benefit most if you do the PSET problems or have a research topic that maps to the course well.
About Us
DOLS (and its predecessors) have been taught since 2012. The present course is taught by a Systems Scientist and involves Guest Lecturers including an Applied Mathematician, an AI architect from industry, and other experts who describe the successes, opportunities and challenges for ML in their fields. MLpracticing scientists, many a product of previous versions of this course, offer tutorials to help you become skillful and support your ML growth.
Structure and Content
MLSDO is an "infinite course," continuing each term as it round robins among topics. Please attend the first two lectures; the material is cast quickly! The course is divided into the FallSpring sequence and the SummerWinter sequence. Core material is covered in Fall and Spring terms, with Spring focusing on Informative Optimization and Fall's focus being Dynamics and Optimization of Learning Systems. and Here is the proposed set for this term:
FallSpring Foundations: Informative Optimization and Dynamics of Learning
Spring:
Fall:
 Systems Dynamics and Optimization (SDO): First, we lose the "irrational exuberance."
 Predictive Systems and Discovery Machines to interact with the complex nonstationary realworld in a variety of physical applications.
 Theory, Models, Data, and the SDO cycle.
 How ML promises to improve efficacy and reduce model error.
 Model Error in Learning and its disastrous effects.
 Do we need theory to help Learning? The role that Conservation, Invariance, Symmetry, Smoothness, Sparsity, Feedbacks and other "theorydriven" constraints play on learning.
 What information principles help machines become efficacious? The notion of Information Gain.
 In an Ideal World: Using a lineargaussian world, we look at optimal estimation, developing basic ideas in organizing data into features, inference on graphs, and learning for information gain. We close the loop for sequential processing through recursive estimation. We examine how LinearGaussian world degrades in a nonlinear, nongaussian world with complex structure.
 Learning Dilemmas: We understand fundamentally how difficult it is to "learn well."
 Risk vs. Empirical Risk, Generalization vs. Extrapolation, Stability vs. Generalization, Consistency
 Localization, hyperparameters, Bias vs. Variance, Invariance vs. Selectivity, No Free Lunch and Universal Computation.
 Inference on Graphs (Graphical Models):
 Variational Inference  twopoint boundary value problems
 Bayesian Inference  sampling, ensemble method and variational bayes
 Probabilistic Graphical Models with Model Selection.
 Kernels (Kernel Machines): We then explore data and features. Kernels (Symmetric Positive Definite forms), Graph Spectra, key kernel pplications, "Kernelization" of many ML algorithms.
 We pay attention to the construction of the manifold (in a datadriven sense) and look at embeddings.
 Measures and Information: We touch the very core by thinking together about similarity measures and how to optimize them in practice. We study
 pnorms, RKHS,
 Entropy, Mutual Information, and Quadratic Mutual Information
 Optimal Transport and Field Alignment.
 Regularization: We investigate the tussle between "fitting" and "constraining."
 Tikhonov, pnorm, Natural Statistics, Entropy, Randomization, Augmentation, Leaveout, Theoryguided.
 Smoothness vs. sparsity, Relationship to Bayes
 Tunable Sparsity.
 Concentration Inequalities with application to model selection.
 Dynamics of Deep Learning [32]:
 Dynamical systems associated with Learning
 TwoPoint Boundary Value Problems, Error Dynamics and Stability.
 Bayesian Parameter Uncertainty Estimation
 Stochastic formulation and Information Transfer Efficiency.
 Informative Decisions: Sampling, Estimation, Optimization, Control, and Learning:
 Understand the notion of information gain, its unifying framework for data selection, parameter and structure selection, feature selection, and adaptation.
 Understand hardness of quantifying information gain
 Develop practical strategies such as
 sparse optimization, playing the Game of 20questions
 Learning for Information Gain applied to targeting data and parameter selection.
 Estimating Network Structure  Initialization and adaptation
 Neural Dynamical Systems [39]:
 Define and Understand Continuous and Discretetime Neural Dynamical Systems
 Develop Exact Networks and Bounds on Approximate Networks
 Develop Stable Learning Formulations for Hybrid Neural Dynamics
 Develop approaches to Uncertainty Quantification
 Relevance, Recommendation, and Reinforcement [Optional]: Many realworld applications with dynamical systems will require the learning machine to keep up with a changing world.
 Feedback mechanisms for incremental (and often online) learning using notions of information gain.
 Application to (some subset of) Search Engines, Recommender Systems, Stochastic Dynamic Programming, Control, Estimation, and Reinforcement Learning
The optional topics are ones that we may insert into the course by removing others from the list. We'll do this in the first two classes through a discussion.
Guest and Special Lectures
 Sara Seager [Exoplanets]
 Dirk Smit [Seismic Inversion]
 Hao Sun [Learning Physics from Data]
 Brent Minchew [Glaciers]
WinterSummer Methods:
Starting in 2019, a section was introduced for methods. The MLpractitioners have rapidly grown and the community this course has fostered enables us to form an effective peer group to support your learning and growth.
Winter
Summer
 Trees, Hashes and Nearest Neighbors  Indexing Schemes Tutorial
 Prinicpal and Independent Components
 Sampling and Hamiltonian Monte Carlo
 Expectation Maximization with Application
 Gaussian Graphical Models
 Kernel Machines  GPR, KPCA, SVM
 Manifold Learning primer
 Convolutional Networks
 Long ShortTerm Memory
 Variational AutoEncoders
 Generative Adversarial Networks
 Boosting and Bagging
PSETS: Application to Systems Dynamics and Optimization
The PSET problems highlight key topics of interest in learning dynamics, where both real data sets and idealized models may be used to communicate key ideas and potential for the role of learning. Three PSETS from the following topics are typically developed. The topics themselves are drawn from applications (see at the end of this page).
 Detection: Use theory and data to learn a very low SNR detector. Use learning to solve inverse problems more efficaciously [Level: Intermediate]
 InformativeUncertainty Quantication: Learn reduced models for rapid, targetted uncertainty quantification [Level: Intermediate]
 Informative Learning: Automatically select parameters, data and model structure to maximize information gain while learning with continuous time neural networks [Level: Hard]
 Stable Parameterization: Learn to build hybrid physicalneural dynamical systems that integrate in a stable manner [Level: Easy]
 Interval Smoothing: Rapidly train a neural dynamical system from data [Level: Easy]
 Downscaling: Use Machine Learning to produce a superresolution method for fluid dynamics [Level: Hard]
 Oscillator Discovery: Use model simulations of fluids to detect lowamplitude oscillations using learning [Level: Intermediate]
 Longrange forecasting: Develop neural dynamical models to forecast extremes at long time ranges. [Level: Easy]
 From Theory to Data and Back: Use theory to initialize learning machines and adapt them with data to advance theory [Level: Intermediate]
 Learnability and Predictability: Explore the relationship between what is predictable and what is learnable [Level: Hard]
 Trouble Spotter: Develop algorithms for fast anomaly detection using Crowds [Level: Easy]
 What's the equation? Develop an algorithm to learn the equations from physical data [Level: Intermediate]
Participation
There are several ways to take this course:
 Casual Exposure: You may wish to attend one or more lectures, tutorials, or guestlectures. If you aren't taking this class for credit, please make detailed notes and submit them. Consider joining one of the other students who are doing a project and supporting them in meaningful ways.
 Entrylevel Participation: You want to learn specific methods. You must attend the Methods section and solve three PSETs or conduct a project. This is best for someone who wants to grow into a peer group of ML users; check out the growing community. (6Cr)To succeed in this course at the EntryLevel, you know a programming language, and some statistics and linear algebra.
 Advanced Academic Participation: You are interested in gaining an indepth understanding. You are well prepared either through courses or practice. You must attend the lectures and finish at least three problem sets. The methods portion can be highly useful but not mandatory. (9Cr)To achieve at the Advanced levels, you've had preparation in more than a third of the following subjects. Linear Algebra, Introductory Probability, and Statistics, Introductory ODE/PDEs, Numerical Methods, Estimation, Inference & Information, Control, Optimization, or Dynamics. An introductory Data Science or Machine Learning course is highly useful but not compulsory.
 Research Participation: You have a specific research project that needs ML. You attend the main lectures, methods, or both, and you present a paper based on your project and course material. It is highly advisable to talk to the instructor when selecting this option (3  12 Cr). To derive value at the Research level, you are examining ML in your research (e.g., for a generals project or thesis). It is highly recommended that you chat with the instructor in this case.
Applications
Data Assimilation, Autonomous Environmental Mapping, Model Reduction, Uncertainty Quantification, Sensor Planning, Prediction and Predictability, Planning for Risk Mitigation, Convective Superparameterization, RadiativeConvective Equilibrium, Nonlocal Operators, Teleconnections, Particle Detection and Sizing, Species Characterization, Paleoclimate, Event Detection and Tracking in X (Voclanoes, Earthquakes, Hurricanes, Tsunamis, Storms and Transits). Superresolution/Downscaling, Coherent Structures in Turbulence, Seismic Imaging and Geomorphology, Porus Media, Reservoirs, Exoplanets. However, other Engineering, Science and Finance applications may be included depending on participant interest. Whilst our motivation and reach is broad, we will drill down each term to a few core applications that is set by participant interest.
Books
The course uses a variety of material; books, notes, online material and code. You can expect each of these to be released before the start of the topic in class.
 C. M. Bishop, "Pattern Recognition and Machine Learning"
 T. Hastie et al. The Elements of Statistical Learning
 I. Goodfellow et al., Deep Learning
 Scholkopf and Smola Learning with Kernels
 S. Raschka, Python Machine Learning
 https://ocw.mit.edu/courses/mathematics/1806linearalgebraspring2010/

https://www.probabilitycourse.com/  for some stat we will need
Sai Ravela (ravela@mit.edu)