Gaze- and speech-enhanced content-based image retrieval in image tagging

Authors:
He Zhang;Teemu Ruokolainen;Jorma Laaksonen;Christina Hochleitner;Rudolf Traunmüller
Affiliations:
Department of Information and Computer Science, Aalto University School of Science, Espoo, Finland;Department of Information and Computer Science, Aalto University School of Science, Espoo, Finland;Department of Information and Computer Science, Aalto University School of Science, Espoo, Finland;Celum Gmbh., Linz, Austria;Celum Gmbh., Linz, Austria
Venue:
ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
Year:
2011

Citing 6
Cited 1

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Self-Organizing Maps

Self-Organizing Maps
Why we tag: motivations for annotation in mobile and online media

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Evaluating the performance in automatic image annotation: Example case by adaptive fusion of global image features

Image Communication
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
PicSOM-self-organizing image retrieval with MPEG-7 content descriptors

IEEE Transactions on Neural Networks

Analyzing emotional semantics of abstract art using low-level image features

IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a setup and experiments where users are checking and correcting image tags given by an automatic tagging system. We study how much the application of a content-based image retrieval (CBIR) method speeds up the process of finding and correcting the erroneously-tagged images. We also analyze the use of implicit relevance feedback from the user's gaze tracking patterns as a method for boosting up the CBIR performance. Finally, we use automatic speech recognition for giving the correct tags for those images that were wrongly tagged. The experiments show a large variance in the tagging task performance, which we believe is primarily caused by the users' subjectivity in image contents as well as their varying familiarity with the gaze tracking and speech recognition setups. The results suggest potentials for gaze and/or speech enhanced CBIR method in image tagging, at least for some users.