Content-Based Image Retrieval at the End of the Early Years
IEEE Transactions on Pattern Analysis and Machine Intelligence
Video Handling with Music and Speech Detection
IEEE MultiMedia
Techniques and Systems for Image and Video Retrieval
IEEE Transactions on Knowledge and Data Engineering
Context-based video retrieval system for the life-log applications
MIR '03 Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval
Wearable imaging system for summarizing personal experiences
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Audio-visual intent-to-speak detection for human-computer interaction
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Content-based video parsing and indexing based on audio-visual interaction
IEEE Transactions on Circuits and Systems for Video Technology
Practical experience recording and indexing of Life Log video
CARPE '05 Proceedings of the 2nd ACM workshop on Continuous archival and retrieval of personal experiences
Multimodal segmentation of lifelog data
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Hi-index | 0.00 |
At present in daily life, many people prefer to record their experiences in multimedia forms instead of by writing a diary. Hence, we have developed the Life Log system to record and manipulate our experiences efficiently. Content-based features from audiovisual data are necessary to detect the significant scenes from our life. One important group of scenes is conversations that contain useful information. However, content-based features alone cannot satisfy people's preferences. This paper demonstrates a novel concept in video retrieval: integrating the content of video data with context from the various sensors in the Life Log system. We attempt to extract the important features from audiovisual data and context features from wearable devices to detect interesting conversations. The experiments present conversation scenes based on audiovisual features and the additional contexts to support more semantic conversation analysis.