PodCastle: a spoken document retrieval system for podcasts and its performance improvement by anonymous user contributions

  • Authors:
  • Jun Ogata;Masataka Goto

  • Affiliations:
  • National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan;National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan

  • Venue:
  • SSCS '09 Proceedings of the third workshop on Searching spontaneous conversational speech
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We have developed a Web 2.0 service, PodCastle, that enables full-text searching of speech data (podcasts) on the basis of automatic speech recognition. PodCastle enables users to search and read podcasts, and to share the full text of speech recognition results for podcasts. However, even state-of-the-art speech recognizers cannot correctly transcribe podcasts, because podcasts' content and recording environments vary widely. PodCastle therefore encourages users to cooperate by correcting speech recognition errors so that podcasts can be searched more reliably. Furthermore, using the resulting corrections to train our speech recognizer provides a mechanism whereby the speech recognition performance is gradually improved. In our experiences from its practical use over the past 30 months (since December, 2006), we confirmed that the performance of PodCastle was improved by a number of anonymous user contributions.