Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons
International Journal of Computer Vision
International Journal of Computer Vision
Distance Metric Learning for Large Margin Nearest Neighbor Classification
The Journal of Machine Learning Research
IEEE Transactions on Pattern Analysis and Machine Intelligence
Endomicroscopic video retrieval using mosaicing and visual words
ISBI'10 Proceedings of the 2010 IEEE international conference on Biomedical imaging: from nano to Macro
Descriptor learning for efficient retrieval
ECCV'10 Proceedings of the 11th European conference on computer vision conference on Computer vision: Part III
Content-Based retrieval in endomicroscopy: toward an efficient smart atlas for clinical diagnosis
MCBR-CDS'11 Proceedings of the Second MICCAI international conference on Medical Content-Based Retrieval for Clinical Decision Support
Hi-index | 0.00 |
Evaluating content-based retrieval (CBR) is challenging because it requires an adequate ground-truth. When the available groundtruth is limited to textual metadata such as pathological classes, retrieval results can only be evaluated indirectly, for example in terms of classification performance. In this study we first present a tool to generate perceived similarity ground-truth that enables direct evaluation of endomicroscopic video retrieval. This tool uses a four-points Likert scale and collects subjective pairwise similarities perceived by multiple expert observers. We then evaluate against the generated ground-truth a previously developed dense bag-of-visual-words method for endomicroscopic video retrieval. Confirming the results of previous indirect evaluation based on classification, our direct evaluation shows that this method significantly outperforms several other state-of-the-art CBR methods. In a second step, we propose to improve the CBR method by learning an adjusted similarity metric from the perceived similarity ground-truth. By minimizing a margin-based cost function that differentiates similar and dissimilar video pairs, we learn a weight vector applied to the visual word signatures of videos. Using cross-validation, we demonstrate that the learned similarity distance is significantly better correlated with the perceived similarity than the original visual-word-based distance.