Collapsed likelihood and sampling hyperparameters.
In bayesian models for NLP there is a common pattern of modeling something as coming from a discrete distribution that has a dirichlet prior.
And it also helps to integrate out the variable theta , to make the sampler converge a lot faster in the right direction (a good intuition as to why this happens is similar to the situation found in stochastic versus batch gradient descent: by the time you've done one pass through a really big dataset, things have moved a lot in the correct direction, with the bonus that a collapsed sampler is actually dealing with theta exactly).
Sampling the decision variables for this sort of model is reasonably easy, but I've found the need to write many different variants of the complete-data likelihood. For example, if you want to allow for an assymmetric alpha vector,
where tcount is, in the LDA document-topic case, a Ndocuments x Ntopics matrix, gmean and gvar are parameters for a gamma prior on the individual alphas and tprior is the current vector of alphas.
Or, if you want to deal with a single alpha (ie, a symmetric dirichlet prior),
but this can be very slow for very sparse topics (if you have lots of them), so in this case I use
in this case you have to pass a parameter for the vocabulary size (since this does not assume a DP), which is s1.
With these functions and this slice sampler
It gets very easy indeed to incorporate hyperparameter sampling into your models.