Thematic indexing of spoken documents by using self-organizing maps

  • Authors:
  • Mikko Kurimo

  • Affiliations:
  • Neural Networks Research Centre, Helsinki University of Technology, P.O. Box 5400, Konemiehentie 2, 02150 Espoo, Finland

  • Venue:
  • Speech Communication
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

A method is presented to provide a useful searchable index for spoken audio documents. The task differs from the traditional (text) document indexing, because large audio databases are decoded by automatic speech recognition and decoding errors occur frequently. The idea in this paper is to take advantage of the large size of the database and select the best index terms for each document with the help of the other documents close to it using a semantic vector space. First, the audio stream is converted into a text stream by a speech recognizer. Then the text of each story is represented in a vector space as a document vector which is the normalized sum of the word vectors in the story. A large collection of such document vectors is used to train a self-organizing map (SOM) to find latent semantic structures in the collection. As the stories in spoken news are short and will include speech recognition errors, smoothing of the document vectors using the semantic clusters determined by the SOM is introduced to enhance the indexing. The application in this paper is the indexing and retrieval of broadcast news on radio and television. Test results are given using the evaluation data from the text retrieval conference (TREC) spoken document retrieval (SDR) task.