A latent semantic retrieval and clustering system for personal photos with sparse speech annotation
SSCS '09 Proceedings of the third workshop on Searching spontaneous conversational speech
Hi-index | 0.00 |
While users prefer high-level semantic photo descriptions (e.g., who, what, when, where), we wish to minimize the need to annotate photos using such descriptions by the user. We propose a latent semantic personal photo retrieval approach using fused image/speech/text features. We use low-level image features to derive relatoionships among sparsely annotated photos, and probabilistic latent semantic analysis (PLSA) models based on fused image/speech/text features to analyze photo “topics”. We then retrieve the photos using text or speech queries of simple high-level semantic words only. In preliminary experiments, while only 10% of the photos were manually annotated, the photos could be well retrieved with very encouraging results.