Dynamics and Optimization of Learning Systems (DOLS)

In Spring 2023, a large number of previous participants are informally meeting. The course will resume again next term formally.
In Spring 2023, this course is being offered at Instituto Polytechnico National (IPN), Mexico.
Prior Offering: Fall 2022, MIT: 12.S592 U/G MIT REGISTRATION CANVAS, Lec: F. 9-11 Rm: 35-308 and Zoom

Instructor: Sai Ravela (ravela@mit.edu)

Who Benefits | About Us | Foundations | Guest Lectures | Tutorials | PSETS | Participation

TL; DR

Dynamics, Optimization, and Information foundations of Machine Learning
ML-based System Dynamics and Optimization for Earth, Planets, Climate, and Life
Flexible participation model
Pre-reqs: Linear Algebra, Probability & Stat, a first course in ML, and some Systems preparation (at least one of Statistical Signal Processing, Estimation, Control or Optimization)

Motivation

Despite numerous toolkits and excellently googled introductory tutorials, useful problem-solving is often stymied beyond the initial “copy-paste” application of Machine Learning. For example, does model selection procedures leave you dissatisfied? Does finding good features seem difficult? Do you wish learning was more efficient or at least the machines better designed? Do you wonder what general Information principles underpin learning, or how to learn from theory and data? If you answered yes to any of these questions, MLSDO is here to help. The course 12.S592 (MLSDO) explores machine learning from a novel and rigorous systems dynamics and optimization perspective. This allows you to understand the strengths and weaknesses, and confidently consider learning machines for your work.

The application of machine learning to science is a central theme. We are especially interested in Earth, Planets, Climate and Life, but engineering, materials, media, and policies are close relatives. Learning is playing an increasing role in these problems. It can be useful to quantify uncertainty, risk, and represent physics parts difficult to numerically model. ML can even discover dynamical equations from data! However, in general, ML must often contend with the extraordinary skill of theory, principles, laws, and knowledge 🙂 ML is a relative novice — even simple physical principles are challenging to learn from data, which practical machines often explicitly encode. Therefore, building hybrid systems part theory- part data-driven is a crucial question that we make some inroads into. To put it in the picture, this is the world this course lives in:

The Course is centered around investigations of dynamics of Learning, its optimization and embedded in dynamic data driven application systems

Here is a snapshot of some aspects of the course

Who Benefits?

This course is most beneficial if you have gotten past writing “Hello World” ML applications, e.g., using MNIST or CIFAR. It is quite helpful when you are looking to develop methodology for real-world problems. This course is much fun for those who like to think from first principles and want to examine ML that way. Those with a Physics, Engineering or Mathematical preparation will particularly enjoy 12.S592. Participants from the industry seem to like this course. You would benefit most if you do the PSET problems or have a research topic that maps to the course well.

About Us

DOLS (and its predecessors) have been taught since 2012. The present course is taught by a Systems Scientist and involves Guest Lecturers including an Applied Mathematician, an AI architect from industry, and other experts who describe the successes, opportunities and challenges for ML in their fields. ML-practicing scientists, many a product of previous versions of this course, offer tutorials to help you become skillful and support your ML growth.

Structure and Content

MLSDO is an “infinite course,” continuing each term as it round robins among topics. Please attend the first two lectures; the material is cast quickly! The course is divided into the Fall-Spring sequence and the Summer-Winter sequence. Core material is covered in Fall and Spring terms, with Spring focusing on Informative Optimization and Fall’s focus being Dynamics and Optimization of Learning Systems. and Here is the proposed set for this term:

Fall-Spring Foundations: Informative Optimization and Dynamics of Learning

Spring:

Fall:

Systems Dynamics and Optimization (SDO): First, we lose the “irrational exuberance.”
- Predictive Systems and Discovery Machines to interact with the complex non-stationary real-world in a variety of physical applications.
- Theory, Models, Data, and the SDO cycle.
  - How ML promises to improve efficacy and reduce model error.
- Model Error in Learning and its disastrous effects.
  - Do we need theory to help Learning? The role that Conservation, Invariance, Symmetry, Smoothness, Sparsity, Feedbacks and other “theory-driven” constraints play on learning.
- What information principles help machines become efficacious? The notion of Information Gain.
In an Ideal World: Using a linear-gaussian world, we look at optimal estimation, developing basic ideas in organizing data into features, inference on graphs, and learning for information gain. We close the loop for sequential processing through recursive estimation. We examine how Linear-Gaussian world degrades in a nonlinear, non-gaussian world with complex structure.
Learning Dilemmas: We understand fundamentally how difficult it is to “learn well.”
1. Risk vs. Empirical Risk, Generalization vs. Extrapolation, Stability vs. Generalization, Consistency
2. Localization, hyperparameters, Bias vs. Variance, Invariance vs. Selectivity, No Free Lunch and Universal Computation.
Inference on Graphs (Graphical Models):
- Variational Inference — two-point boundary value problems
- Bayesian Inference — sampling, ensemble method and variational bayes
- Probabilistic Graphical Models with Model Selection.
Kernels (Kernel Machines): We then explore data and features. Kernels (Symmetric Positive Definite forms), Graph Spectra, key kernel pplications, “Kernelization” of many ML algorithms.
- We pay attention to the construction of the manifold (in a data-driven sense) and look at embeddings.
Measures and Information: We touch the very core by thinking together about similarity measures and how to optimize them in practice. We study
- p-norms, RKHS,
- Entropy, Mutual Information, and Quadratic Mutual Information
- Optimal Transport and Field Alignment.
Regularization: We investigate the tussle between “fitting” and “constraining.”
- Tikhonov, p-norm, Natural Statistics, Entropy, Randomization, Augmentation, Leave-out, Theory-guided.
- Smoothness vs. sparsity, Relationship to Bayes
- Tunable Sparsity.
- Concentration Inequalities with application to model selection.
Dynamics of Deep Learning[32]:
- Dynamical systems associated with Learning
- Two-Point Boundary Value Problems, Error Dynamics and Stability.
- Bayesian Parameter Uncertainty Estimation
- Stochastic formulation and Information Transfer Efficiency.
Informative Decisions: Sampling, Estimation, Optimization, Control, and Learning:
- Understand the notion of information gain, its unifying framework for data selection, parameter and structure selection, feature selection, and adaptation.
- Understand hardness of quantifying information gain
- Develop practical strategies such as
  - sparse optimization, playing the Game of 20-questions
- Learning for Information Gain applied to targeting data and parameter selection.
- Estimating Network Structure — Initialization and adaptation
Neural Dynamical Systems [39]:
- Define and Understand Continuous and Discrete-time Neural Dynamical Systems
- Develop Exact Networks and Bounds on Approximate Networks
- Develop Stable Learning Formulations for Hybrid Neural Dynamics
- Develop approaches to Uncertainty Quantification
Relevance, Recommendation, and Reinforcement [Optional]: Many real-world applications with dynamical systems will require the learning machine to keep up with a changing world.
- Feedback mechanisms for incremental (and often online) learning using notions of information gain.
- Application to (some sub-set of) Search Engines, Recommender Systems, Stochastic Dynamic Programming, Control, Estimation, and Reinforcement Learning

The optional topics are ones that we may insert into the course by removing others from the list. We’ll do this in the first two classes through a discussion.

Guest and Special Lectures

Sara Seager [Exoplanets]
Dirk Smit [Seismic Inversion]
Hao Sun [Learning Physics from Data]
Brent Minchew [Glaciers]

Winter-Summer Methods:

Starting in 2019, a section was introduced for methods. The ML-practitioners have rapidly grown and the community this course has fostered enables us to form an effective peer group to support your learning and growth.

Winter

Summer

Trees, Hashes and Nearest Neighbors — Indexing Schemes Tutorial
Prinicpal and Independent Components
Sampling and Hamiltonian Monte Carlo
Expectation Maximization with Application
Gaussian Graphical Models
Kernel Machines — GPR, KPCA, SVM
Manifold Learning primer
Convolutional Networks
Long Short-Term Memory
Variational AutoEncoders
Generative Adversarial Networks
Boosting and Bagging

PSETS: Application to Systems Dynamics and Optimization

The PSET problems highlight key topics of interest in learning dynamics, where both real data sets and idealized models may be used to communicate key ideas and potential for the role of learning. Three PSETS from the following topics are typically developed. The topics themselves are drawn from applications (see at the end of this page).

Detection: Use theory and data to learn a very low SNR detector. Use learning to solve inverse problems more efficaciously [Level: Intermediate]
InformativeUncertainty Quantication: Learn reduced models for rapid, targetted uncertainty quantification [Level: Intermediate]
Informative Learning: Automatically select parameters, data and model structure to maximize information gain while learning with continuous time neural networks [Level: Hard]
Stable Parameterization: Learn to build hybrid physical-neural dynamical systems that integrate in a stable manner [Level: Easy]
Interval Smoothing: Rapidly train a neural dynamical system from data [Level: Easy]
Downscaling: Use Machine Learning to produce a super-resolution method for fluid dynamics [Level: Hard]
Oscillator Discovery: Use model simulations of fluids to detect low-amplitude oscillations using learning [Level: Intermediate]
Long-range forecasting: Develop neural dynamical models to forecast extremes at long time ranges. [Level: Easy]
From Theory to Data and Back: Use theory to initialize learning machines and adapt them with data to advance theory [Level: Intermediate]
Learnability and Predictability: Explore the relationship between what is predictable and what is learnable [Level: Hard]
Trouble Spotter: Develop algorithms for fast anomaly detection using Crowds [Level: Easy]
What’s the equation? Develop an algorithm to learn the equations from physical data [Level: Intermediate]

Participation

There are several ways to take this course:

Casual Exposure: You may wish to attend one or more lectures, tutorials, or guest-lectures. If you aren’t taking this class for credit, please make detailed notes and submit them. Consider joining one of the other students who are doing a project and supporting them in meaningful ways.
Entry-level Participation: You want to learn specific methods. You must attend the Methods section and solve three PSETs or conduct a project. This is best for someone who wants to grow into a peer group of ML users; check out the growing community. (6Cr)To succeed in this course at the Entry-Level, you know a programming language, and some statistics and linear algebra.
Advanced Academic Participation: You are interested in gaining an in-depth understanding. You are well prepared either through courses or practice. You must attend the lectures and finish at least three problem sets. The methods portion can be highly useful but not mandatory. (9Cr)To achieve at the Advanced levels, you’ve had preparation in more than a third of the following subjects. Linear Algebra, Introductory Probability, and Statistics, Introductory ODE/PDEs, Numerical Methods, Estimation, Inference & Information, Control, Optimization, or Dynamics. An introductory Data Science or Machine Learning course is highly useful but not compulsory.
Research Participation: You have a specific research project that needs ML. You attend the main lectures, methods, or both, and you present a paper based on your project and course material. It is highly advisable to talk to the instructor when selecting this option (3 – 12 Cr). To derive value at the Research level, you are examining ML in your research (e.g., for a generals project or thesis). It is highly recommended that you chat with the instructor in this case.

Applications

Data Assimilation, Autonomous Environmental Mapping, Model Reduction, Uncertainty Quantification, Sensor Planning, Prediction and Predictability, Planning for Risk Mitigation, Convective Super-parameterization, Radiative-Convective Equilibrium, Nonlocal Operators, Teleconnections, Particle Detection and Sizing, Species Characterization, Paleoclimate, Event Detection and Tracking in X (Voclanoes, Earthquakes, Hurricanes, Tsunamis, Storms and Transits). Super-resolution/Downscaling, Coherent Structures in Turbulence, Seismic Imaging and Geomorphology, Porus Media, Reservoirs, Exoplanets. However, other Engineering, Science and Finance applications may be included depending on participant interest. Whilst our motivation and reach is broad, we will drill down each term to a few core applications that is set by participant interest.

Books

The course uses a variety of material; books, notes, online material and code. You can expect each of these to be released before the start of the topic in class.

C. M. Bishop, “Pattern Recognition and Machine Learning”
T. Hastie et al. The Elements of Statistical Learning
I. Goodfellow et al., Deep Learning
Scholkopf and Smola Learning with Kernels
S. Raschka, Python Machine Learning
https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/
https://www.probabilitycourse.com/ — for some stat we will need

Sai Ravela (ravela@mit.edu)