Garnelo et al, ICML 2018, [link]
tags: structured learning - graphical models - icml - 2018
In the Conditional Neural Processes (CNP
) setting, we are given labeled points, called observations , and another set of unlabeled targets .We assume
that the outputs are a realization of the following process: Given a distribution over functions in , sample , and set for all in the targets set.
The goal is to learn a prediction model for the output samples while trying to obtain the same flexibility as Gaussian processes, rather than using the standard supervised learning paradigm of deep neural networks. The main inconvient of using standard Gaussian Processes is that they do not scale well ().
CNP
s give up on the theoretical guarantees of the Gaussian Process framework in exchange for more flexibility. In particular, observations are encoded in a representation of fixed dimension, independent of .
In other words, we first encode each observation and combine these embeddings via an operator to obtain a fixed representation of the observations, . Then for each new target we obtain parameters conditioned on and , which determine the stochastic process to draw outputs from.
Figure: Conditional Neural Processes Architecture.
In practice, is taken to be the mean operation, i.e., is the average of s over all observations. For regression tasks, is a Gaussian distribution parameterized by mean and variance For classification tasks, simply encodes a discrete distribution over classes.
Given observations , , we sample uniformly in and train the model to predict labels for the whole observations set, conditioned only on the subset by minimizing the negative log likelihood:
Note: The sampling step is not clear, but it seems to model the stochasticity in output given .
The mode scales with , i.e., linear time, which is much better than the cubic rate of Gaussian Processes.
1D regression: We generate a dataset that consist of functions generated from a GP with an exponential kernel. At every training step we sample a curve from the GP (), select a subset of points as observations, and a subset of points as target points. The output distribution on the target labels is parameterized as a Gaussian whose mean and variance are output by .
Image completion. is a function that maps a pixel coordinate to a RGB triple. At each iteration, an image from the training set is chosen, a subset of its pixels is selected, and the aim is to predict the RGB value of the remaining pixels while conditioned on those. In particular, it is interesting to see that CNP
allows for flexible conditioning patterns, contrary to other conditional generative models which are often constrained by either the architecture (e.g. PixelCNN
) or training scheme.
Few shot classification. Consider a dataset with many classes but only few examples per class (e.g. Omniglot
). In this set of experiments, the model is trained to predict labels for samples in a select subset of classes, while being conditioned on all remaining samples in the dataset.