Learning To Count Objects in ImagesLearning To Count Objects in ImagesLempitsky, Victor S. and Zisserman, Andrew2010
Paper summaryjoecohenThey introduce the concept of counting in images by predicting a density map. Their training only requires dot annotations on the center of objects. Each dot is expanded to a gaussian to form a density. A model is trained to predict this density and then the total count is recovered by integrating over the resulting density map.
They create a function to produce the density based on quantized dense SIFT features \cite{lowe03distinctive} from every pixel in the image. A simple version of the definition of $F$ is shown below. Each pixel becomes an $x_p$ vector which is used to train and model to implement the function $F$.
$$\forall p \in I, \hspace{10pt } F(p|w) = wx_p $$
The obtained quantized dense SIFT features using the [VLFEAT](http://www.vlfeat.org/overview/dsift.html) library. The significant part of the code is shown below:
```
im = imread(['data/' num2str(j, '%03d') 'cell.png']);
im = im(:,:,3); %using the blue channel to compute data
disp('Computing dense SIFT...');
[f d] = vl_dsift(single(im)); %computing the dense sift descriptors centered at each pixel
%estimating the crop parameters where SIFTs were not computed:
minf = floor(min(f,[],2));
maxf = floor(max(f,[],2));
minx = minf(1);
miny = minf(2);
maxx = maxf(1);
maxy = maxf(2);
%simple quantized dense SIFT, each image is encoded as MxNx1 numbers of
%dictionary entries numbers with weight 1 (see the NIPS paper):
disp('Quantizing SIFTs...');
features{j} = vl_ikmeanspush(uint8(d),Dict);
features{j} = reshape(features{j}, maxy-miny+1, maxx-minx+1);
weights{j} = ones(size(features{j}));
```
The benchmark their algorithm using "Bacterial cells in fluorescence-light microscopy images". The heatmap to the right shows the predicted density.
https://i.imgur.com/Vz463nu.png
The evaluation is performed by training on $N$ images (with $N$ in a validation set) and the testing on 100 randomly picked images in a hold out set. They show that using more images results in less variance and higher accuracy.
https://i.imgur.com/hihfC8V.png
Paper website: http://www.robots.ox.ac.uk/~vgg/research/counting/index_org.html
They introduce the concept of counting in images by predicting a density map. Their training only requires dot annotations on the center of objects. Each dot is expanded to a gaussian to form a density. A model is trained to predict this density and then the total count is recovered by integrating over the resulting density map.
They create a function to produce the density based on quantized dense SIFT features \cite{lowe03distinctive} from every pixel in the image. A simple version of the definition of $F$ is shown below. Each pixel becomes an $x_p$ vector which is used to train and model to implement the function $F$.
$$\forall p \in I, \hspace{10pt } F(p|w) = wx_p $$
The obtained quantized dense SIFT features using the [VLFEAT](http://www.vlfeat.org/overview/dsift.html) library. The significant part of the code is shown below:
```
im = imread(['data/' num2str(j, '%03d') 'cell.png']);
im = im(:,:,3); %using the blue channel to compute data
disp('Computing dense SIFT...');
[f d] = vl_dsift(single(im)); %computing the dense sift descriptors centered at each pixel
%estimating the crop parameters where SIFTs were not computed:
minf = floor(min(f,[],2));
maxf = floor(max(f,[],2));
minx = minf(1);
miny = minf(2);
maxx = maxf(1);
maxy = maxf(2);
%simple quantized dense SIFT, each image is encoded as MxNx1 numbers of
%dictionary entries numbers with weight 1 (see the NIPS paper):
disp('Quantizing SIFTs...');
features{j} = vl_ikmeanspush(uint8(d),Dict);
features{j} = reshape(features{j}, maxy-miny+1, maxx-minx+1);
weights{j} = ones(size(features{j}));
```
The benchmark their algorithm using "Bacterial cells in fluorescence-light microscopy images". The heatmap to the right shows the predicted density.
https://i.imgur.com/Vz463nu.png
The evaluation is performed by training on $N$ images (with $N$ in a validation set) and the testing on 100 randomly picked images in a hold out set. They show that using more images results in less variance and higher accuracy.
https://i.imgur.com/hihfC8V.png
Paper website: http://www.robots.ox.ac.uk/~vgg/research/counting/index_org.html