Audio Keywords Discovery for Text-Like Audio Content Analysis and Retrieval

Authors:
Lie Lu;A. Hanjalic
Affiliations:
Microsoft Res. Asia, Beijing;-
Venue:
IEEE Transactions on Multimedia
Year:
2008

Citing 0
Cited 11

A Novel Video Classification Method Based on Hybrid Generative/Discriminative Models

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Semantic concept annotation based on audio PLSA model

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Text-like segmentation of general audio for content-based retrieval

IEEE Transactions on Multimedia
Clustering for music search results

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Coherent bag-of audio words model for efficient large-scale video copy detection

Proceedings of the ACM International Conference on Image and Video Retrieval
Multimodal video concept detection via bag of auditory words and multiple kernel learning

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Supervised dictionary learning for music genre classification

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
There is no data like less data: percepts for video concept detection on consumer-produced media

Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis
On the Applicability of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks

International Journal of Multimedia Data Engineering & Management
Scalable multimedia content analysis on parallel platforms using python

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Fusing audio vocabulary with visual features for pornographic video detection

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Inspired by classical text document analysis employing the concept of (key) words, this paper presents an unsupervised approach to discover (key) audio elements in general audio documents. The (key) audio elements can be considered the equivalents of the text (key) words, and enable content-based audio analysis and retrieval following the analogy to the proven text analysis theories and methods. Since general audio signals usually show complicated and strongly varying distribution and density in the feature space, we propose an iterative spectral clustering method with context-dependent scaling factors to decompose an audio data stream into audio elements. Using this clustering method, temporal signal segments with similar low-level features are grouped into natural clusters that we adopt as audio elements. To detect those audio elements that are most representative for the semantic content, that is, the key audio elements, two cases are considered. First, if only one audio document is available for analysis, a number of heuristic importance indicators are defined and employed to detect the key audio elements. For the case that multiple audio documents are available, more sophisticated measures for audio element importance, including expected term frequency (ETF), expected inverse document frequency (EIDF), expected term duration (ETD) and expected inverse document duration (EIDD), are proposed. Our experiments showed encouraging results regarding the quality of the obtained (key) audio elements and their potential applicability for content-based audio document analysis and retrieval.