It should be clear to a typical researcher in inference, optimization or control that deep learning is essentially a multi-stage two-point boundary value problem [1]. As both data assimilation researchers and machine learning researchers, the obvious connection, which I first wrote down in sometime in February 2016 as part of teaching machine learning was that a typical feed-forward network can be expressed as a discrete multi-stage process, and that learning proceeds through a parameter estimation problem driven by two terminal conditions; the input on the left boundary and the output on the right boundary. This discovery led to many other points, particularly as we may consider recurrent networks and continunous-time neural networks, establishing a rich connection between stochastic pdes and learning. This is a subject of research in our group.
To see the connection, read the following notes.