Speech audio retrieval using voice query

Authors:
Chotirat Ann Ratanamahatana;Phubes Tohlong
Affiliations:
Dept. of Computer Engineering, Chulalongkorn University, Bangkok, Thailand;Dept. of Computer Engineering, Chulalongkorn University, Bangkok, Thailand
Venue:
ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities
Year:
2006

Citing 5
Cited 1

Warping indexes with envelope transforms for query by humming

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Query by humming: in action with its technology revealed

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Structural Representation of Speech for Phonetic Classification

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
A segment-based audio-visual speech recognizer: data collection, development, and initial experiments

Proceedings of the 6th international conference on Multimodal interfaces
Searching the Web by voice

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2

On nonmetric similarity search problems in complex domains

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimedia data has increasingly become a prevalent resource in Digital Library system; this includes audio, video, and image archives. However, each type of these data may need specific tools to help facilitate effective and efficient retrieval tasks. In this paper, we focus on retrieval of speech audio collection, which includes audio books, speech recordings, interviews, and lectures. Currently, most of the audio retrieval systems are based on keyword/title/author search typed into the system by users. The system then searches for particular keywords and gives a list of entire audio files that potentially are relevant to the query. Nonetheless, browsing audio content for particular section of the audios without knowing the actual content is yet a very difficult task. Moreover, since audio transcription or keyword annotation is very labor intensive and becomes infeasible for large data, we introduce here a preliminary framework that locates subsections of the audio that correspond to the voice query made by a user. We demonstrate a utility of our approach on query retrieval tasks in various types of audio recordings. We also show that this simple framework can potentially help retrieve and locate the voice query within the audio accurately and efficiently.