This paper proposes an image-based model for visual clutter perception ("a crowded, disorderly state"). For a given image, the model begins by applying an existing superpixel clustering then computing the intensity, colour and orientation histograms of pixels within each superpixel. Boundaries between adjacent superpixels are then retained or merged to create "proto-objects". The novel merging algorithm acts on the Earth Movers Distance (EMD), a measure of the similarity between two histograms. The distribution of histogram distances in each image for each image feature is modeled as a mixture of two Weibull distributions. The crossover point between the two distributions (or a fixed cumulative percentile if a single distribution is preferred by model selection) is used as the threshold point for merging: an edge is labelled ``similar'', and the superpixels merged, if the pair of superpixels exceed the threshold point for all three features. The clutter value for each image is the ratio of the final number of proto-objects to the initial number of superpixels (i.e. 0 = no proto-objects, not cluttered; 1 = all superpixels are proto-objects). The model is validated by comparing to human clutter rankings of a subset of an existing image database. Human observers rank images from least to most cluttered, then the median ranking for each image is used as the ground truth for clutter perception. The new model correlates more highly with human rankings of clutter than a number of previous clutter perception and image segmentation models (including human object segmentation from a previous study).