Li et al, CVPR 2018, [link]
tags: domain adaptation - domain generalization - cvpr - 2018
MMD-matching techniques to domain generalization: No target data available, so instead align all domains to a prior distribution
The goal of domain generalization is to find a common domain-invariant feature space underlying the source and (unseen) target spaces, under the assumption that such a space exists. To learn such space, the authors propose a variant of , whose goal is to minimize the variance between the different source domains distributions using Maximum Mean Discrepancy. Additionally, the source distributions are aligned with a fixed prior distribution, with the hope that this reduces the risk of overfitting to the seen domains.
The proposed model,
MMD-AAE (Maximum Mean Discrepancy Adversarial Auto-encoder) consists in an encoder , that maps inputs to latent codes, and a decoder . These are equipped with a standard autoencoding loss to make the model learn meaningful embeddings
Based on the
AAE framework , we also want the learned latent codes to match a certain prior distribution, (In practice, a Laplace distribution). This is done by introducing a
GAN (Generative Adversarial Networks) loss term on the generated embeddings, with the prior as the true, target, distribution. Introducing , a discriminator with binary outputs, we have:
On top of the
AAE objective, the authors propose to regularize the feature space using
MMD, extended to the multi-domain setting. They key idea of the maximum mean discrepancy (
MMD) is to compare two distributions and using their mean statistics rather than density estimators:
A classical choice is to take as the space of linear functions and to leverage the kernel trick on a reproducing kernel Hilbert space to efficiently compute the difference of means
Following , the authors extend this metric for multiple domains. First, we define the distributional variance to measure the dissimilarity across domains using a mean map operator, computed via the distribution over all domains, .
which is 0 if and only if all the domain distributions are equal. Finally, this quantity is hard to compute but can be upper-bounded by a sum of pairwise
In practice, the MMD is computed under a specific kernel choice and approximating the expectations by their empirical estimates.
where is the kernel function associated to feature map . Experiments are conducted with various Gaussian RBF priors.
Finally, the model should learn a representation that is also adequate for the task at hand (here, classification). This is done by adding a classifier (two fully connected layers) on top of the representation minimizing a standard cross entropy loss term, between the input image label and the model output.
MMD functional space uses the
RBF (Gaussian) kernel.
The final objective is the weighted sum of the four aforementioned loss terms ( and ).
The model is trained similarly to
GANs: The discriminator and generator (auto-encoder) parameters are updated in two alternating optimization steps.
The method is evaluated on various classification tasks (digit, object and action recognition) in settings where the domains differ by small geometric changes (e.g., change of pose). They also compare to a large range of baselines, although none of them seem to have an adversarially learned representation space (maybe  would have been a good additional baseline).
Additionally, ablation experiments show that the three terms , and all have a positive effect on the final effect, even when taken individually.