Centroid Networks for Few-Shot Clustering and Unsupervised Few-Shot Classification on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Centroid Networks for Few-Shot Clustering and Unsupervised Few-Shot Classification
Gabriel Huang and Hugo Larochelle and Simon Lacoste-Julien
arXiv e-Print archive - 2019 via Local arXiv
Keywords: cs.LG, stat.ML
more

Summaries/Notes 1

[link] Summary by gabriel 5 years ago

Disclaimer: I am the first author.

# Executive summary
- The authors propose a new method, [*Centroid Networks*](https://arxiv.org/pdf/1902.08605.pdf), for learning to cluster. 
- Given example clusterings of data, the goal is to learn how to cluster new data following the same criterion.
- Centroid Networks basically consist of running K-means on Prototypical Network features, plus many tricks.
- They evaluate Centroid Networks on Omniglot and miniImageNet (supervised few-shot classification benchmarks). 
- Centroid Networks can compete with Prototypical Networks (state of the art in supervised few-shot classification) despite using no supervision at evaluation time (the labels of the support set are completely ignored).

## Pros
* **Simple training** (non end-to-end, very similar to prototypical networks, with additional tricks).
* **Very fast clustering** (nearly same running time as prototypical networks).
* The authors claim that the Sinkhorn K-means formulation is empirically very stable : any initialization is fine as long as symmetries are broken (in practice, they initialize all centroids at 0 and add a small gaussian noise to centroids at each step).

## Cons
* Clusters **need to be balanced** now. Removing the balanced constraint is future work.

# Setting
- They frame learning to cluster as a meta-learning problem, **few-shot clustering**. 
- The goal is to cluster K*M images into K clusters of M images. 
- Classes vary across tasks, but class semantics are the same (Omniglot: cluster by character, miniImageNet: cluster by object category).
- They also define a second task **unsupervised few-shot classification** solely for comparing with supervised few-shot classification methods.

# Method

Conceptually, Centroid Networks consist of training *Prototypical Networks* (meta-training), then running *K-means* on top of protonet representations at clustering time (meta-evaluation).  However, the authors propose several tricks that significantly improve upon that baseline:
- **Center loss**: when pretraining, this extra regularization term penalizes the intra-class variance.
- **Sinkhorn assignments**: when pretraining, replace the softmax predictions p(y|x) with a formulation based on optimal transport (Sinkhorn distances).
- **Sinkhorn K-means**: at clustering time, run the Sinkhorn K-means algorithm on the learned representation

# Results on Few-Shot Classification Benchmarks:

- The task is **unsupervised few-shot classification**: cluster a *unlabeled* support set, then predict which clusters new images should be classified into. 
- Target metric is **unsupervised accuracy**.
- *Unsupervised* few-shot classification is harder than *supervised* few-shot classification because *no labels* are given in the support set.
- Compare with reference oracle Prototypical networks, which can access labeled support set.
- Centroid Networks are almost as good as Protonets on Omniglot (99.1% vs. reference 99.7%)
- Centroid Networks are comparable to Protonets on miniImageNet (53.1% vs. reference 66.9%).
- The proposed "tricks" are useful because Centroid Networks beats K-Means (Protonet feature) baseline.

https://i.imgur.com/acQpQeq.png
https://i.imgur.com/FlHf9Ko.png

# Results on Learning to Cluster Benchmarks:

- The task is **few-shot clustering**. After training on 30 alphabets of Omniglot, the task is to cluster 20 new alphabets (20-47 characters, with 20 instances/character). 
- Target metric is **clustering accuracy**.
- Centroid Networks beat all flavors of Constrained Clustering Networks (86.6% vs. 83.3%)
- Centroid Networks are about 100 times than CCN faster but less flexible (fixed cluster sizes).

https://i.imgur.com/PvH5V1W.png

# Code

The code is available at https://github.com/gabrielhuang/centroid-networks

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private