Learning Systems Dynamics and Optimization (MLSDO)

Spring 2021

MIT: 12.S592  U/G      MIT REGISTRATION     CANVAS
           
Lec: F. 9-11 : Zoom
           

Instructor: Sai Ravela (ravela@mit.edu)

Who Benefits | About Us | Foundations | Guest Lectures | Tutorials | PSETS | Participation

TL; DR
  • Dynamics, Optimization, and Information foundations of Machine Learning
  • ML-based System Dynamics and Optimization for Earth, Planets, Climate, and Life
  • Flexible participation model
  • Pre-reqs: Linear Algebra, Probability & Stat, a first course in ML, and some Systems preparation (at least one of Statistical Signal Processing, Estimation, Control or Optimization)

Motivation

Despite numerous toolkits and excellently googled introductory tutorials, useful problem-solving is often stymied beyond the initial "copy-paste" application of Machine Learning. For example, does model selection leave you dissatisfied? Does finding great features seem difficult? Do you wish learning was more efficient or at least the machines better designed? Do you wonder what general Information principles underpin learning, or how to learn from theory and data? If you answered yes to any of these questions, MLSDO is here to help. The course 12.S592 (MLSDO) explores machine learning from a novel and rigorous systems dynamics and optimization perspective. This allows you to understand the strengths and weaknesses, and confidently consider learning machines for your work. 

 

The application of machine learning to science is a central theme. We are especially interested in Earth, Planets, Climate and Life, but engineering, materials, media, and policies are close relatives. Learning is playing an increasing role in these problems. It can be useful to quantify uncertainty, risk, and represent physics parts difficult to numerically model. ML can even discover dynamical equations from data! However, in general, ML must often contend with the extraordinary skill of theory, principles, laws, and knowledge :) ML is a relative novice -- even simple physical principles are challenging to learn from data, which practical machines often explicitly encode. Therefore, building hybrid systems part theory- part data-driven is a crucial question that we make some inroads into. To put it in the picture, this is the world this course lives in:

Dynamic Data Driven Sytems Science

The Course is centered around investigations of dynamics of Learning, its optimization and embedded in dynamic data driven application systems
 

Here is a snapshot of various aspects of the course

 

Who Benefits?

 

This course is most beneficial if you have gotten past writing “Hello World” ML applications, e.g., using MNIST or CIFAR. It is quite helpful when you are looking to develop methodology for real-world problems. This course is much fun for those who like to think from first principles and want to examine ML that way. Those with a Physics, Engineering or Mathematical preparation will particularly enjoy 12.S592. Participants from the industry seem to like this course. 

About Us

MLSDO (and its predecessor) have been taught since 2013. The present course is taught by a Systems Scientist together with several Guest Lecturers including an Applied Mathematician, an AI architect from industry, and other experts who describe the successes, opportunities and challenges for ML in their fields.  ML-practicing scientists, many a product of previous versions of this course, offer tutorials to help you become skillful and support your ML growth. 

Structure and Content

MLSDO is an "infinite course," continuing each term as it round robins among topics. Please attend the first two lectures; the material is cast quickly! Typically we have the following components: Lectures (fundamentals), Guest Lectures (application), Tutorials ( methods), and PSETs/Projects (problem-solving). Here is the proposed set for this term:

Foundations: Dynamics of Learning

  1. Systems Dynamics and Optimization (SDO):  First, we lose the "irrational exuberance."
    • Predictive Systems and Discovery Machines to interact with the complex non-stationary real-world in a variety of physical applications.
    • Theory, Models, Data, and the SDO cycle.
      • How ML promises to improve efficacy and reduce model error.
    • Model Error in Learning and its disastrous effects. 
      • Do we need theory to help Learning? The role that Conservation, Invariance, Symmetry, Smoothness, Sparsity, Feedbacks and other "theory-driven" constraints play on learning.
    • What information principles help machines become efficacious?  The notion of Information Gain. 
  2. In an Ideal World: Using a linear-gaussian world,  we look at optimal estimation,  developing basic ideas in organizing data into features, inference on graphs, and learning for information gain. We close the loop for sequential processing through recursive estimation. We examine how Linear-Gaussian world  degrades in a nonlinear, non-gaussian world with complex structure. 
  3. Learning Dilemmas: We understand fundamentally how difficult it is to "learn well."
    1. Risk vs. Empirical Risk, Generalization vs. Extrapolation, Stability vs. Generalization, Consistency
    2. Localization, hyperparameters, Bias vs. Variance, Invariance vs. Selectivity, No Free Lunch and Universal Computation.
  4. Inference on Graphs
    • ​​Variational Inference -- two-point boundary value problems
    • Bayesian Inference -- sampling, ensemble method and variational bayes
    • Probabilistic Graphical Models with Model Selection.
  5. Kernels: We then explore data and features. Kernels (Symmetric Positive Definite forms), Graph Spectra, key kernel pplications, "Kernelization" of many ML algorithms. 
  6. Measures and Information: We touch the very core by thinking together about similarity measures and how to optimize them in practice. We study
    • p-norms, RKHS,
    • Entropy, Mutual Information, and Quadratic Mutual Information
    • Optimal Transport and Field Alignment.  
  7. Regularization: We investigate the tussle between "fitting" and "constraining."
    • Tikhonov, p-norm, Natural Statistics, Entropy, Randomization, Augmentation, Leave-out, Theory-guided.
    • Smoothness vs. sparsity, Relationship to Bayes
    • Tunable Sparsity.
    • Concentration Inequalities with application to model selection.
  8. Dynamics of Deep Learning  [32]:
    • Dynamical systems associated with Learning
    • Two-Point Boundary Value Problems, Error Dynamics and Stability.
    • Bayesian Parameter Uncertainty Estimation
    • Stochastic formulation and Information Transfer Efficiency.
  9. ​Informative Learning [36]:
    • Understand the notion of information gain,  its unifying framework for data selection, parameter and structure selection, feature selection, and adaptation.
    • Understand hardness of quantifying information gain
    • Develop practical strategies such as
      • sparse optimization, playing the Game of 20-questions
    • Learning for Information Gain applied to targeting data and parameter selection. 
    • Estimating Network Structure -- Initialization and adaptation 
  10. Neural Dynamical Systems [39]:
    • Define and Understand Continuous and Discrete-time Neural Dynamical Systems 
    • Develop Exact Networks and Bounds on Approximate Networks
    • Develop Stable Learning Formulations for Hybrid Neural Dynamics
    • Develop approaches to Uncertainty Quantification
  11. Relevance, Recommendation, and Reinforcement [Optional]: Many real-world applications with dynamical systems will require the learning machine to keep up with a changing world.
    • Feedback mechanisms for incremental (and often online) learning using notions of information gain. 
    • Application to (some sub-set of) Search Engines, Recommender Systems, Stochastic Dynamic Programming, Control, Estimation, and Reinforcement Learning

The optional topics are ones that we may insert into the course by removing others from the list. We'll do this in the first two classes through a discussion.

Guest and Special Lectures [39]

  1. Sara Seager [Exoplanets]
  2. Dirk Smit [Seismic Inversion]
  3. Hao Sun [Learning Physics from Data]
  4. Brent Minchew [Glaciers]

Methods (These change from term to term). [1 each]

Starting in Fall 19, a section was introduced for methods. The ML-practitioners have rapidly grown and the community this course has fostered enables us to form an effective peer group to support your learning and growth. 

  1. Trees, Hashes and Nearest Neighbors -- Indexing Schemes Tutorial
  2. Prinicpal and Independent Components
  3. Sampling and Hamiltonian Monte Carlo
  4.  Expectation Maximization with Application
  5. Gaussian Graphical Models
  6. Kernel Machines -- GPR, KPCA, SVM and Manifolds x 2
  7. Convolutional Networks
  8. Long Short-Term Memory 
  9. AutoEncoders  
  10. Generative Adversarial Networks
  11. Boosting and Bagging 

PSETS: Application to Systems Dynamics and Optimization

The PSET problems highlight key topics of interest in learning dynamics, where both real data sets and idealized models may be used to communicate key ideas and potential for the role of learning. Three PSETS from the following topics are typically developed. The topics themselves are drawn from applications (see at the end of this page).

  1. Detection: Use theory and data to learn a very low SNR detector.  Use learning to solve inverse problems more efficaciously [Level: Intermediate]
  2. Targetted Uncertainty Quantication:  Learn reduced models for rapid, targetted uncertainty quantification [Level: Intermediate]
  3. Informative Learning:  Automatically select parameters, data and model structure to maximize information gain while learning with continuous time neural networks [Level: Hard]
  4. Stable Parameterization: Learn to build hybrid physical-neural dynamical systems that integrate in a stable manner [Level: Easy]
  5. Interval Smoothing: Rapidly train a neural dynamical system from data [Level: Easy]
  6. Downscaling: Use Machine Learning to produce a super-resolution method for fluid dynamics [Level: Hard]
  7. Oscillator Discovery: Use model simulations of fluids to detect low-amplitude oscillations using learning [Level: Intermediate]
  8. Long-range forecasting: Develop models to forecast extremes at long time ranges. [Level: Easy]
  9. From Theory to Data and Back: Use theory to initialize learning machines and adapt them with data to advance theory [Level: Intermediate]
  10. Learnability and Predictability: Explore the relationship between what is predictable and what is learnable [Level: Hard]
  11. Trouble Spotter: Develop algorithms for fast anomaly detection using Crowds [Level: Easy]

Participation

 

There are several ways to take this course:

 

  • Casual Exposure: You may wish to attend one or more lectures, tutorials, or guest-lectures. If you aren't taking this class for credit, please make detailed notes and submit them. Consider joining one of the other students who are doing a project and supporting them in meaningful ways. 
  • Entry-level Participation: You want to learn specific methods. You must attend the Methods section and solve three PSETs or conduct a project. This is best for someone who wants to grow into a peer group of ML users; check out the growing community. (6Cr)​To succeed in this course at the Entry-Level, you know a programming language, and some statistics and linear algebra. 
  • Advanced Academic Participation: You are interested in gaining an in-depth understanding. You are well prepared either through courses or practice. You must attend the lectures and finish at least three problem sets. The methods portion can be highly useful but not mandatory. (9Cr)To achieve at the Advanced levels, you've had preparation in more than a third of the following subjects. Linear Algebra, Introductory Probability, and Statistics, Introductory ODE/PDEs, Numerical Methods, Estimation, Inference & Information, Control, Optimization, or Dynamics. An introductory Data Science or Machine Learning course is highly useful but not compulsory. 
  • Research Participation: You have a specific research project that needs ML. You attend the main lectures, methods, or both, and you present a paper based on your project and course material. It is highly advisable to talk to the instructor when selecting this option (3 - 12 Cr)​. To derive value at the Research level, you are examining ML in your research (e.g., for a generals project or thesis). It is highly recommended that you chat with the instructor in this case.  

Applications

Data Assimilation, Autonomous Environmental Mapping, Model Reduction, Uncertainty Quantification, Sensor Planning, Prediction and Predictability, Planning for Risk Mitigation, Convective Super-parameterization, Radiative-Convective Equilibrium, Nonlocal Operators, Teleconnections, Particle Detection and Sizing, Species Characterization, Paleoclimate, Event Detection and Tracking in X (Voclanoes, Earthquakes, Hurricanes, Tsunamis, Storms and Transits). Super-resolution/Downscaling, Coherent Structures in Turbulence,  Seismic Imaging and Geomorphology, Porus Media, Reservoirs, Exoplanets. However, other Engineering, Science and Finance applications may be included depending on participant interest. Whilst our motivation and reach is broad, we will drill down each term to a few core applications that is set by participant interest.

Books

The course uses a variety of material; books, notes, online material and code. You can expect each of these to be released before the start of the topic in class. 

  1. C. M. Bishop, "Pattern Recognition and Machine Learning"
  2. T. Hastie et al. The Elements of Statistical  Learning
  3. I. Goodfellow et al., Deep Learning
  4. Scholkopf and Smola Learning with Kernels 
  5. S. Raschka, Python Machine Learning
  6. https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/
  7.  https://www.probabilitycourse.com/ -- for some stat we will need

Sai Ravela (ravela@mit.edu)