E. Raff and J. Sylvester, DSAA 2018, [link]
tags: representation learning - dsaa - 2018
ALFR which also relies on adversarial training.
The proposed model builds on the Domain Adversarial Network (
DANN) , originally introduced for unsupervised domain adaptation. Given some labeled data , and some unlabeled data , the goal is to learn a network that solves both classification tasks and while learning a shared representation between and .
The model is composed of a feature extractor which then branches off into a target branch, , to predict the target label, and a domain branch, , predicting whether the input data comes either from domain or . The model parameters are trained with the following objective:
The gradient updates for this saddle point problem can be efficiently implemented using the Gradient Reversal Layer introduced in .
In Gradient Reversal Against Discrimination (
GRAD), samples come only from one domain , and the domain classifier is replaced by an attribute classifier, , whose goal is to predict the value of the protected attribute .
In other words, the training objective strives to build a feature representation of that is good enough to predict the correct label but such that cannot easily be deduced from it.
Figure: Diagram of
GRAD architecture. Red connection indicates normal forward propagation, but back-propagation will reverse the signs.
On the other hand, one could directly learn a classification network which would be penalized when predicting the correct value of attribute ; However such a model could learn and trivially outputs an incorrect value. This situation is prevented by the proposed adversarial training scheme.
The authors also consider a variant of the described model where the target branch instead solves the autoencoding/reconstruction task. The features learned by the encoder can then later be used as entry point of a smaller network for classification or any other task.
CNNtrained without the protected attribute protection branch
LFR: A classifier with an intermediate latent code is trained with an objective that combines a classification loss (the model should accurately classify $x$), a reconstruction loss (the learned representation should encode enough information about the input to reconstruct it accurately) and a parity loss (estimate the probability for both populations with and and strive to make them equal)
VAEwhere the protected attribute is factorized out of the latent code $z$, and additional invariance is imposed via a
MMDobjective which tries to match the moments of the posterior distributions and .
ALFR : As in
LFR, this paper proposes a model trained with a reconstruction loss and a classification loss. Additionally, they propose to quantify the dependence between the learned representation and the protected attribute by adding an adversary classifier that tries to extract the attribute value from the representation, formulated and trained as in the Generative Adversarial Network (
GRAD always reaches highest consistency compared to baselines. For the other metrics, the results are more mitigated, although it usually achieves best or second best results. It is also not clear how to choose between
GRAD-auto as there does not seem to be a clear winner, although
GRAD-pred is a more intuitive solution when supervision is available, as it directly solves the classification task.
Authors also report a small experiment showing that protecting several attributes at once can be more beneficial than protecting a single attribute. This can be expected as some attributes are highly correlated or interact in meaningful way.
In particular, protecting several attributes at once can easily be done in the
GRAD framework by making the attribute prediction branch multi-class for instance: however it is not clear in the paper how it is actually done in practice, nor whether the same idea could also be integrated in the baselines for further comparison.