Summary Statistics for Partitionings and Feature AllocationsSummary Statistics for Partitionings and Feature AllocationsFidaner, Isik Baris and Cemgil, Ali Taylan2013
Paper summarynipsreviewsThe authors propose novel approaches for summarizing the posterior of partitions in infinite mixture models. Often in applications, the posterior of the partition is quite diffuse; thus, the default MAP estimate is unsatisfactory. The proposed approach is based on the cumulative block sizes, which counts the number of clusters of size $\ge k$, for $k=1, …,n$. They also examine the projected cumulative block sizes, when the partition is projected onto a subset of $\\{1,...,n\\}$. These quantities are summarized by the cumulative occurrence distribution, the per element information of a set, the entropy, the projected entropy, and the subset occurrence. Finally, they propose using an agglomerative clustering algorithm where the projection entropy is used to measure distances between sets. In illustrations, the posterior of the partition is summarized by the dendrogram produced from the entropy agglomerative algorithm, along with existing summaries such as the posterior histogram of the number of clusters and the pairwise occurrences.
The authors propose novel approaches for summarizing the posterior of partitions in infinite mixture models. Often in applications, the posterior of the partition is quite diffuse; thus, the default MAP estimate is unsatisfactory. The proposed approach is based on the cumulative block sizes, which counts the number of clusters of size $\ge k$, for $k=1, …,n$. They also examine the projected cumulative block sizes, when the partition is projected onto a subset of $\\{1,...,n\\}$. These quantities are summarized by the cumulative occurrence distribution, the per element information of a set, the entropy, the projected entropy, and the subset occurrence. Finally, they propose using an agglomerative clustering algorithm where the projection entropy is used to measure distances between sets. In illustrations, the posterior of the partition is summarized by the dendrogram produced from the entropy agglomerative algorithm, along with existing summaries such as the posterior histogram of the number of clusters and the pairwise occurrences.