Opinion: relation between Machine Learning and Probabilistic Programming
Reposting here my comment on Hacker News:
Both Machine Learning (ML) and Probabilistic Programming (PP) work towards building a mathematical object (model), which takes observation as input and produces prediction as output. Both ML and PP are about finding the parameters of the model, which requires bits of information. Main differences:
The type of bits of information used to find parameters of the model is one main difference.
In ML, information used to "train" the model is called "training set" and usually consists of the observation-prediction pairs (or just observations, for unsupervised learning), which are of the same kind as desired observation-prediction pair. In other words, with ML you "fit" your model using the same type of data, which you will use the model for. At least, this is the most common scenario of ML.
In the problems, which PP specializes in, this is not as prevalent. For example, with PP we may often see a model which predicts where the planet is, while the model is itself trained using apples falling from the tree.
The type of output is the second difference.
ML focuses on producing a single value of the most likely prediction (Maximum a posteriori estimate), while PP produces a probability distribution of the predicted quantity.
Arguably, the probability distribution of the prediction is more useful for any problem where we need to use the prediction for some decision-making process because then we can calculate the expectation of (mis)decision-cost over possible values of prediction.
Practically, many decision-making situations can be transformed into estimating some quantity (instead of taking some arbitrary action) and ML model can be built to directly produce prediction, which is at the same time a decision.
The speed of prediction is the third difference.
ML model, after the "slow" training phase, produces a mathematical object, which is a straight function and can be very quickly calculated for any observation to produce prediction (think: forward-pass in a deep-learning network).
With PP, with most common models, you need to go through a "slow" inference (MCMC-like) process for every new observation to produce a prediction. It is possible, in some cases, to design a probabilistic model such that it only needs to be fitted once and can then be transformed into an analytical expression (function) producing predictions from observations.
As the result of these differences, people applying PP and ML are of different characters :) Still, I expect ML and PP to converge over time. Specifically, I notice that:
ML-models are used to represent probabilistic distributions in probabilistic models; and
trained ML-models are being interpreted in terms of their information content (in probabilistic terms, as if ML-model is probabilistic model).