Towards robust features for classifying audio in the CueVideo system

Authors:
Savitha Srinivasan;Dragutin Petkovic;Dulce Ponceleon
Affiliations:
IBM Almaden Research Center, 650 Harry Road, San Jose, CA;IBM Almaden Research Center, 650 Harry Road, San Jose, CA;IBM Almaden Research Center, 650 Harry Road, San Jose, CA
Venue:
MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Year:
1999

Citing 10
Cited 12

Fuzzy logic, neural networks, and soft computing

Communications of the ACM
Retrieving spoken documents by combining multiple index sources

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic audio content analysis

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
SpeechSkimmer: a system for interactively skimming recorded speech

ACM Transactions on Computer-Human Interaction (TOCHI) - Special issue on speech as data
Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents

DL '97 Proceedings of the second ACM international conference on Digital libraries
Manipulation of music for melody matching

MULTIMEDIA '98 Proceedings of the sixth ACM international conference on Multimedia
Content-Based Classification, Search, and Retrieval of Audio

IEEE MultiMedia
Speech recognition in the Informedia Digital Video Library: uses and limitations

TAI '95 Proceedings of the Seventh International Conference on Tools with Artificial Intelligence
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Real-time discrimination of broadcast speech/music

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02

Detecting topical events in digital video

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Determining computable scenes in films and their structures using audio-visual memory models

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Automatically extracting highlights for TV Baseball programs

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
A robust audio classification and segmentation method

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Issues in Ground-Truthing Graphic Documents

GREC '01 Selected Papers from the Fourth International Workshop on Graphics Recognition Algorithms and Applications
Semantic Video Retrieval Using Audio Analysis

CIVR '02 Proceedings of the International Conference on Image and Video Retrieval
Acoustic environment classification

ACM Transactions on Speech and Language Processing (TSLP)
Visual information retrieval: minerva video benchmark

SPPRA'06 Proceedings of the 24th IASTED international conference on Signal processing, pattern recognition, and applications
Audio classification based on maximum entropy model

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Rushes video summarization by object and event understanding

Proceedings of the international workshop on TRECVID video summarization
Multimedia surrogates for video gisting: Toward combining spoken words and imagery

Information Processing and Management: an International Journal
SVM-based audio classification for content-based multimedia retrieval

MCAM'07 Proceedings of the 2007 international conference on Multimedia content analysis and mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The role of audio in the context of multimedia applications involving video is becoming increasingly important. Many efforts in this area focus on audio data that contains some built-in semantic information structure such as in broadcast news, or focus on classification of audio that contains a single type of sound such as cleaar speech or clear music only. In the CueVideo system, we detect and classify audio that consists of mixed audio, i.e. combinations of speech and music together with other types of background sounds. Segmentation of mixed audio has applications in detection of story boundaries in video, spoken document retrieval systems, audio retrieval systems etc. We modify and combine audio features known to be effective in distinguishing speech from music, and examine their behavior on mixed audio. Our preliminary experimental results show that we can achieve a classification accuracy of over 80% for such mixed audio. Our study also provides us with several helpful insights related to analyzing mixed audio in the context of real applications.