Clustering of Imperfect Transcripts Using a Novel Similarity Measure

Authors:
Oktay Ibrahimov;Ishwar K. Sethi;Nevenka Dimitrova
Affiliations:
-;-;-
Venue:
Information Retrieval Techniques for Speech Applications [this book is based on the workshop “Information Retrieval Techniques for Speech Applications”, held as part of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in New Orleans, USA, in September 2001].
Year:
2001

Citing 12
Cited 1

Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Informedia: news-on-demand multimedia information acquisition and retrieval

Intelligent multimedia information retrieval
Complementary video and audio analysis for broadcast news archives

Communications of the ACM
Classification of general audio data for content-based retrieval

Pattern Recognition Letters - Special issue on image/video indexing and retrieval
Content-Based Classification, Search, and Retrieval of Audio

IEEE MultiMedia
Learning Approaches for Detecting and Tracking News Events

IEEE Intelligent Systems
Video OCR: indexing digital new libraries by recognition of superimposed captions

Multimedia Systems - Special section on video libraries
The Cambridge University spoken document retrieval system

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Audio-Visual Content Analysis for Content-Based Video Indexing

ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2
Video visualization for compact presentation and fast browsing of pictorial content

IEEE Transactions on Circuits and Systems for Video Technology
Performance characterization of video-shot-change detection methods

IEEE Transactions on Circuits and Systems for Video Technology
Rapid estimation of camera motion from compressed video with application to video annotation

IEEE Transactions on Circuits and Systems for Video Technology

ACM SIGIR 2001 workshop "Information Retrieval Techniques for Speech Applications"

ACM SIGIR Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been a surge of interest in the last several years in methods for automatic generation of content indices for multimedia documents, particularly with respect to video and audio documents. As a result, there is much interest in methods for analyzing transcribed documents from audio and video broadcasts and telephone conversations and messages. The present paper deals with such an analysis by presenting a clustering technique to partition a set of transcribed documents into different meaningful topics. Our method determines the intersection between matching transcripts, evaluates the information contribution by each transcript, assesses the information closeness of overlapping words and calculates similarity based on Chi-square method. The main novelty of our method lies in the proposed similarity measure that is designed to withstand the imperfections of transcribed documents. Experimental results using documents of varying quality of transcription are presented to demonstrate the efficacy of the proposed methodology.