Zhao et al, AAAI 2019, [link]
tags: representation learning - aaai - 2017
VAEs are that (i) The variational bound (ELBO) can lead to poor approximation of the true likelihood and inaccurate models and (ii) the model can ignore the learned latent representation when the decoder is too powerful. In this work, the author propose to tackle these problems by adding an explicit mutual information term to the standard
InfoVAE objective can be seen as a variant of the standard
VAE objective with two main modifications (i) an additional term that strives to maximize the mutual information between the input and latent , to force the model to make use of the latent representation, and (ii) a weighting between the reconstruction and latent loss term to better balance their contribution (similar to -
The first two terms correspond to a weighted variant of the ELBO, while the last term adds a constraint on the latent codes distribution. Since estimating would require marginalizing over all , it is instead approximated by sampling (first then , from the encoder distribution).
Furthermore, the authors show that the objective is still valid when replacing these last terms by any other hard divergence. In particular, this makes it a generalization of Adversarial Auto-encoders (
AAE) , which uses the Jensen Divergence (approximated by an adversary).
The authors experiment with three different divergences: Jensen (
AAE), Stein Variational Gradient and Maximum-Mean Discrepancy. Results seem to indicate that
InfoVAE leads to more principled latent representations, and better balance between reconstruction and latent space usage. As such, reconstructions might not look as crisp than ones from vanilla
VAE but generated samples are of better quality (better generalization, also can be seen in semi-supervised task that make use of the representation)