Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Object Categorization by Learned Universal Visual Dictionary
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Adapted vocabularies for generic visual categorization
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
Hi-index | 0.00 |
Bag Of visual Words (BoW) is widely regarded as the standard representation of visual information present in the images and is broadly used for retrieval and concept detection in videos. The generation of visual vocabulary in the BoW framework generally includes a quantization step to cluster the image features into a limited number of visual words. This quantization achieved through unsupervised clustering does not take any advantage of the relationship between the features coming from images belonging to similar concept(s), thus enlarging the semantic gap. We present a new dictionary construction technique to improve the BoW representation by increasing its discriminative power. Our solution is based on a two step quantization: we start with k-means clustering followed by a bottom-up supervised clustering using features' label information. Results on the TRECVID 2007 data [8] show improvements with the proposed construction of the BoW. We equally give upperbounds of improvement over the baseline for the retrieval rate of each concept using the best supervised merging criteria.