Fuzzy logic, neural networks, and soft computing
Communications of the ACM
Retrieving spoken documents by combining multiple index sources
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic audio content analysis
MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
SpeechSkimmer: a system for interactively skimming recorded speech
ACM Transactions on Computer-Human Interaction (TOCHI) - Special issue on speech as data
DL '97 Proceedings of the second ACM international conference on Digital libraries
Manipulation of music for melody matching
MULTIMEDIA '98 Proceedings of the sixth ACM international conference on Multimedia
Content-Based Classification, Search, and Retrieval of Audio
IEEE MultiMedia
Speech recognition in the Informedia Digital Video Library: uses and limitations
TAI '95 Proceedings of the Seventh International Conference on Tools with Artificial Intelligence
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Real-time discrimination of broadcast speech/music
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Detecting topical events in digital video
MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Determining computable scenes in films and their structures using audio-visual memory models
MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Automatically extracting highlights for TV Baseball programs
MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
A robust audio classification and segmentation method
MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Issues in Ground-Truthing Graphic Documents
GREC '01 Selected Papers from the Fourth International Workshop on Graphics Recognition Algorithms and Applications
Semantic Video Retrieval Using Audio Analysis
CIVR '02 Proceedings of the International Conference on Image and Video Retrieval
Acoustic environment classification
ACM Transactions on Speech and Language Processing (TSLP)
Visual information retrieval: minerva video benchmark
SPPRA'06 Proceedings of the 24th IASTED international conference on Signal processing, pattern recognition, and applications
Audio classification based on maximum entropy model
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Rushes video summarization by object and event understanding
Proceedings of the international workshop on TRECVID video summarization
Multimedia surrogates for video gisting: Toward combining spoken words and imagery
Information Processing and Management: an International Journal
SVM-based audio classification for content-based multimedia retrieval
MCAM'07 Proceedings of the 2007 international conference on Multimedia content analysis and mining
Hi-index | 0.00 |
The role of audio in the context of multimedia applications involving video is becoming increasingly important. Many efforts in this area focus on audio data that contains some built-in semantic information structure such as in broadcast news, or focus on classification of audio that contains a single type of sound such as cleaar speech or clear music only. In the CueVideo system, we detect and classify audio that consists of mixed audio, i.e. combinations of speech and music together with other types of background sounds. Segmentation of mixed audio has applications in detection of story boundaries in video, spoken document retrieval systems, audio retrieval systems etc. We modify and combine audio features known to be effective in distinguishing speech from music, and examine their behavior on mixed audio. Our preliminary experimental results show that we can achieve a classification accuracy of over 80% for such mixed audio. Our study also provides us with several helpful insights related to analyzing mixed audio in the context of real applications.