First published: 2019/02/22 (6 months ago) Abstract: Traditional clustering algorithms such as K-means rely heavily on the nature
of the chosen metric or data representation. To get meaningful clusters, these
representations need to be tailored to the downstream task (e.g. cluster photos
by object category, cluster faces by identity). Therefore, we frame clustering
as a meta-learning task, few-shot clustering, which allows us to specify how to
cluster the data at the meta-training level, despite the clustering algorithm
itself being unsupervised.
We propose Centroid Networks, a simple and efficient few-shot clustering
method based on learning representations which are tailored both to the task to
solve and to its internal clustering module. We also introduce unsupervised
few-shot classification, which is conceptually similar to few-shot clustering,
but is strictly harder than supervised* few-shot classification and therefore
allows direct comparison with existing supervised few-shot classification
methods. On Omniglot and miniImageNet, our method achieves accuracy competitive
with popular supervised few-shot classification algorithms, despite using *no
labels* from the support set. We also show performance competitive with
state-of-the-art learning-to-cluster methods.
Disclaimer: I am the first author.
# Executive summary
- The authors propose a new method, [*Centroid Networks*](https://arxiv.org/pdf/1902.08605.pdf), for learning to cluster.
- Given example clusterings of data, the goal is to learn how to cluster new data following the same criterion.
- Centroid Networks basically consist of running K-means on Prototypical Network features, plus many tricks.
- They evaluate Centroid Networks on Omniglot and miniImageNet (supervised few-shot classification benchmarks).
- Centroid Networks can compete with Prototypical Networks (state of the art in supervised few-shot classification) despite using no supervision at evaluation time (the labels of the support set are completely ignored).
* **Simple training** (non end-to-end, very similar to prototypical networks, with additional tricks).
* **Very fast clustering** (nearly same running time as prototypical networks).
* The authors claim that the Sinkhorn K-means formulation is empirically very stable : any initialization is fine as long as symmetries are broken (in practice, they initialize all centroids at 0 and add a small gaussian noise to centroids at each step).
* Clusters **need to be balanced** now. Removing the balanced constraint is future work.
- They frame learning to cluster as a meta-learning problem, **few-shot clustering**.
- The goal is to cluster K*M images into K clusters of M images.
- Classes vary across tasks, but class semantics are the same (Omniglot: cluster by character, miniImageNet: cluster by object category).
- They also define a second task **unsupervised few-shot classification** solely for comparing with supervised few-shot classification methods.
Conceptually, Centroid Networks consist of training *Prototypical Networks* (meta-training), then running *K-means* on top of protonet representations at clustering time (meta-evaluation). However, the authors propose several tricks that significantly improve upon that baseline:
- **Center loss**: when pretraining, this extra regularization term penalizes the intra-class variance.
- **Sinkhorn assignments**: when pretraining, replace the softmax predictions p(y|x) with a formulation based on optimal transport (Sinkhorn distances).
- **Sinkhorn K-means**: at clustering time, run the Sinkhorn K-means algorithm on the learned representation
# Results on Few-Shot Classification Benchmarks:
- The task is **unsupervised few-shot classification**: cluster a *unlabeled* support set, then predict which clusters new images should be classified into.
- Target metric is **unsupervised accuracy**.
- *Unsupervised* few-shot classification is harder than *supervised* few-shot classification because *no labels* are given in the support set.
- Compare with reference oracle Prototypical networks, which can access labeled support set.
- Centroid Networks are almost as good as Protonets on Omniglot (99.1% vs. reference 99.7%)
- Centroid Networks are comparable to Protonets on miniImageNet (53.1% vs. reference 66.9%).
- The proposed "tricks" are useful because Centroid Networks beats K-Means (Protonet feature) baseline.
# Results on Learning to Cluster Benchmarks:
- The task is **few-shot clustering**. After training on 30 alphabets of Omniglot, the task is to cluster 20 new alphabets (20-47 characters, with 20 instances/character).
- Target metric is **clustering accuracy**.
- Centroid Networks beat all flavors of Constrained Clustering Networks (86.6% vs. 83.3%)
- Centroid Networks are about 100 times than CCN faster but less flexible (fixed cluster sizes).
The code is available at https://github.com/gabrielhuang/centroid-networks