Machine Learning With System Dynamics and Optimization

Spring 2020

MIT: 12.S592  9Cr(3-1-5), U/G
           
Lec: Fr. 0900-1200 @ 54-1623
           Rec: Fr. 1300-1400 @ 54-1827

Instructor: Sai Ravela (ravela@mit.edu)
with: Aime Fournier, Michael Barbehenn, Pawan Bharadwaj, Gregory Britten, Eric Beauce, Brindha Kanniah

Who Benefits | About Us | Foundations | Guest Lectures | Tutorials | PSETS | Participation

 

Motivation

Is there much more than meets the eye when it comes to ML? Despite numerous toolkits and excellent introductory tutorials, useful problem-solving is often stymied beyond the initial "copy-paste" application. Did model selection leave you with lousy models? Do you want to know how to find great features? Are you stuck with a sizeable pre-trained network, but desirous of more efficient ones? Are you curious how Learning departs from estimation, control, and optimization, or does it? How machines could learn from Theory and Data? Do you wonder what general Information principles underpin Learning? If you answered yes to any of these, MLSDO is here to help. The course 12.S592 (MLSDO) explores machine learning from a novel and rigorous systems dynamics and optimization perspective. This helps you understand the strengths and weaknesses, and confidently consider learning machines for your work. Check out a teaser here. 

 

Who Benefits?

 

This course is most beneficial if you have gotten past writing “Hello World” ML applications, e.g., using MNIST or CIFAR. It is quite helpful if you are somewhat at a loss developing a methodology for real-world problems. This course is much fun for those who like to think from first principles and want to examine ML that way. Those with a Physics, Engineering or Mathematical preparation will particularly enjoy 12.S592. Those who have participated in the industry seem to like this course. Check out a sample lecture here; it that made sense, you are in the right place. 

 

 

About Us

MLSDO (and its predecessor) have been taught since 2013. The present course is taught by a Systems Scientist, an Applied Mathematician, and an AI architect from industry.  Several Guest Lecturers with expertise in their respective fields describe the successes, opportunities and challenges for ML in their native fields.  ML-practicing Climate and Earth Scientists, many a product of previous versions of this course, offer tutorials to help you become skillful and support your ML growth. 

Structure and Content

MLSDO is an "infinite course," continuing each term as it round robins among topics. Please attend the first two lectures; the material is cast quickly! Typically we have the following components: Lectures (fundamentals), Guest Lectures (application), Tutorials ( methods), and PSETs/Projects (problem-solving). Here is the proposed set for this term:

Foundations: Dynamics of Learning

  1. Systems Dynamics and Optimization [2]:   The elements of interaction with the real-world in a variety of physical applications involving Prediction and Discovery using models and data. The SDO cycle. Need for ML.
  2. Theory vs Data [4]:  We learn quickly to stop being irrationally exuberant about Learning using very simple problems. 
    • Model Error in Learning and its disastrous effects. 
    • How can theory help? Conservation, Invariance, Symmetry, Sparsity, Feedbacks and other "theory-driven" constraints accelerate learning;
    • How can Learning "without theory" to "maximize information gain" become effective?
  3. In an Ideal World  [8]: Using a linear-gaussian world,  we look at optimal solutions to learning, developing basic ideas that summarize the whole course: data organization, offline-online learning, finding good features, model selection, conditional independence, sequential and recursive inference, informative learning. We also learn how optimality in a linear Gaussian world is a far cry for many problems that live in a nonlinear, non-gaussian world, and have complex structure. 
  4. Learning Dilemmas [11]: We understand fundamentally how difficult it is to "learn well."
    1. Risk vs. Empirical Risk, Generalization vs. Extrapolation, Stability vs. Generalization, Consistency
    2. Localization, hyperparameters, Bias vs. Variance, Invariance vs. Selectivity, No Free Lunch and Universal Computation.
  5. Estimation on Graphs: [15]: Variational Inference, Bayesian Inference, Probabilistic Graphical Model
  6. Measures and Information [19]: We touch the very core by thinking together about similarity measures. We study Lp spaces, RKHS, Entropy, Transport.  How to optimize them in practice. 
  7. Regularization [24]: We investigate the tussle between "fitting" and "constraining."
    • Tikhonov, Marazov, Ivanov,  Lp sparsity, Natural Statistics, Entropy, Randomization, Augmentation, Leave-out, Theory-guided sparsity.
    • smoothness vs. sparsity, Relationship to Bayes
    • Tunable Sparsity.
    • Concentration Inequalities with application to model selection.
  8. Kernels [28]: We then explore data and features. Kernels (Symmetric Positive Definite forms), Graph Spectra, key kernel pplications, the Kernel interpretation/reduction of many ML algorithms. 
  9. Dynamics of Deep Learning  [32]:
    • Types of dynamical systems associated with Learning
    • 2BVP, Error Dynamics and Stability.
    • Bayesian Parameter Uncertainty Estimation
    • Stochastic PDE Information Transfer Efficiency. 
  10. ​Informative Learning [36]:
    • Understand the notion of information gain and its value in data selection, parameter and structure selection, feature selection, and adaptation of representation. Understand NP-hardness of quantifying information gain and practical strategies such as the playing the Game of 20-questionsLearning for Information Gain -- targetted data and parameter selection. 
    • Estimating Network Structure -- Initialization and adaptation 
  11. Neural Dynamical Systems [optional]:
    • Continuous and Discrete-time Neural Dynamical Systems 
    • Exact Networks and Bounds on Approximate Network Size
    • Stable Learning in Hybrid Dynamics
    • Uncertainty Quantification
  12. Relevance, Recommendation, and Reinforcement [Optional]: Many real-world applications with dynamical systems will require the learning machine to keep up with a changing world.
    • Feedback mechanisms for incremental (and often online) learning using notions of information gain. 
    • Application to (some sub-set of) Search Engines, Recommender Systems, Stochastic Dynamic Programming, Control, Estimation, and Reinforcement Learning

The optional topics are ones that we may insert into the course by removing others from the list. We'll do this in the first two classes through a discussion.

Guest and Special Lectures [39]

  1. Sara Seager [Exoplanets]
  2. Dirk Smit [Seismic Inversion]
  3. Hao Sun [Learning Physics from Data]
  4. Sai Ravela [Some Amazing Learning Ideas from Computer Vision]

Methods (These change from term to term). [1 each]

Starting in Fall 19, a section was introduced for methods. The ML-practitioners have rapidly grown and the community this course has fostered enables us to form an effective peer group to support your learning and growth. 

  1. Python with Keras/PyTorch -- Eric Beauce (2/7)
  2. Prinicpal and Independent Components -- Pawan Bharadwaj (2/14)
  3. Gaussian Mixtures and Expectation Maximization --  Gregory Britten  (2/21)
  4. Inference on Graphs and Networks -- Gregory Britten (2/28)
  5. Sampling and Hamiltonian Monte Carlo  -- Aime Fournier (3/6)
  6. Trees, Hashes and Nearest Neighbors -- Michael Barbehenn (3/13)
  7. Kernel Machines -- Sai Ravela (3/20)
  8. Convolutional Neural Networks  -- Eric Beauce (4/3)
  9. Long Short-Term Memory  --Brindha Kanniah (4/10)
  10. AutoEncoders   -- Brindha Kanniah (4/17)
  11. Generative Adversarial Networks -- Sai Ravela (4/24)
  12. Random Forests -- Sai Ravela (5/1)
  13. Boosting --  Sai Ravela (5/8)

PSETS: Application to Systems Dynamics and Optimization

The PSET problems highlight key topics of interest in learning dynamics, where both real data sets and idealized models may be used to communicate key ideas and potential for the role of learning. Three PSETS from the following topics are typically developed. The topics themselves are drawn from applications (see at the end of this page).

  1. Detection: Use theory and data to learn a very low SNR detector.  Use learning to solve inverse problems more efficaciously [Level: Intermediate]
  2. Reduction and Uncertainty Quantication:  Learn reduced models for rapid, targetted uncertainty quantification [Level: Intermediate]
  3. Informative Learning:  Automatically select parameters, data and model structure to maximize information gain while learning with continuous time neural networks [Level: Hard]
  4. Stable Parameterization: Learn to build hybrid physical-neural dynamical systems that integrate in a stable manner [Level: Easy]
  5. Interval Smoothing: Rapidly train a neural dynamical system from data [Level: Easy]
  6. Downscaling: Use Machine Learning to produce a super-resolution method for fluid dynamics [Level: Hard]
  7. Oscillator Discovery: Use model simulations of fluids to detect low-amplitude oscillations using learning [Level: Intermediate]
  8. Long-range forecasting: Develop models to forecast extremes at long time ranges. [Level: Easy]
  9. From Theory to Data and Back: Use theory to initialize learning machines and adapt them with data to advance theory [Level: Intermediate]
  10. Learnability and Predictability: Explore the relationship between what is predictable and what is learnable [Level: Hard]
  11. Trouble Spotter: Develop algorithms for fast anomaly detection using Crowds [Level: Easy]

Participation

There are several ways to take this course:

  • Casual Exposure: You may wish to attend one or more lectures, tutorials, or guest-lectures. If you aren't taking this class for credit, please make detailed notes and give them to us. Consider joining one of the other students who are doing a project and supporting them in meaningful ways. 
  • Entry-level Participation: You want to quickly learn how to use methods that we cover in Methods (Rec). You must attend the Methods section and solve three PSETs. This is best for someone who would like to grow into a peer group of ML users; checkout the growing community. (6Cr)
  • Advanced-level Academic Participation: You are interested in gaining a more in-depth understanding, and you are well prepared either through courses or practice. You must attend the lectures and finish at least three problem sets. The methods portion can be highly useful but not mandatory. (9Cr)
  • Advanced-level Research Participation: You have a specific research project that needs ML. You attend the main lectures, methods, or both, and you present a paper based on your project and course material. It is highly advisable to talk to the instructor when selecting this option. (1 - 12 Cr)

To succeed in this course, at the Entry-Level, you know a programming language, statistics, and linear algebra.

To achieve at the Advanced levels, you've had preparation in more than a third of the following:  Linear Algebra, Introductory Probability and Statistics, Introductory ODE/PDEs, Numerical Methods, Estimation, Inference & Information, Control, Optimization, Decision Theory, or Dynamics. An introductory Data Science or Machine Learning course is highly useful but not compulsory. To succeed at the Research level, you are either using ML in your research or are considering it for a generals project. If in doubt, talk to the instructor. We've also have had people attend the course multiple times, apparently to more closely and deeply grasp particular topics or sections.

 

Applications

Data Assimilation, Autonomous Environmental Mapping, Model Reduction, Uncertainty Quantification, Sensor Planning, Prediction and Predictability, Planning for Risk Mitigation, Convective Super-parameterization, Radiative-Convective Equilibrium, Nonlocal Operators, Teleconnections, Particle Detection and Sizing, Species Characterization, Paleoclimate, Event Detection and Tracking in X (Voclanoes, Earthquakes, Hurricanes, Tsunamis, Storms and Transits). Super-resolution/Downscaling, Coherent Structures in Turbulence,  Seismic Imaging and Geomorphology, Porus Media, Reservoirs, Exoplanets. However, other Engineering, Science and Finance applications may be included depending on participant interest. Whilst our motivation and reach is broad, we will drill down each term to a few core applications that is set by participant interest.

Books

The course uses a variety of material; books, notes, online material and code. You can expect each of these to be released before the start of the topic in class. 

  1. C. M. Bishop, "Pattern Recognition and Machine Learning"
  2. T. Hastie et al. The Elements of Statistical  Learning
  3. I. Goodfellow et al., Deep Learning
  4. S. Raschka, Python Machine Learning

 

Sai Ravela (ravela@mit.edu)