Revistes Catalanes amb Accés Obert (RACO)

MMKK++ algorithm for clustering heterogeneous images into an unknown number of clusters

Dávid Papp, Gábor Szűcs


In this paper we present an automatic clustering procedure with the main aim to predict the number of clusters of unknown, heterogeneous images. We used the Fisher-vector for mathematical representation of the images and these vectors were considered as input data points for the clustering algorithm. We implemented a novel variant of K-means, the kernel K-means++, furthermore the min-max kernel K-means plusplus (MMKK++) as clustering method. The proposed approach examines some candidate cluster numbers and determines the strength of the clustering to estimate how well the data fit into K clusters, as well as the law of large numbers was used in order to choose the optimal cluster size. We conducted experiments on four image sets to demonstrate the efficiency of our solution. The first two image sets are subsets of different popular collections; the third is their union; the fourth is the complete Caltech101 image set. The result showed that our approach was able to give a better estimation for the number of clusters than the competitor methods. Furthermore, we defined two new metrics for evaluation of predicting the appropriate cluster number, which are capable of measuring the goodness in a more sophisticated way, instead of binary evaluation.

Full Text: PDF