Topic-Sensitive Language Modelling

  • Authors:
  • Mirjam Sepesy Maucec;Zdravko Kacic

  • Affiliations:
  • -;-

  • Venue:
  • TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper proposes a new framework to construct topic-sensitive language models for large vocabulary speech recognition. Identifying a domain of discourse, a model appropriate for the current domain can be built. In our experiments, the target domain was represented with a piece of text. By using appropriate features, sub-corpus of a large collection of training text was extracted. Our feature selection process was especially suited to languages where words are formed by many different inflectional affixatation. All words with the same meaning (but different grammatical form) were collected in one cluster and represented as one feature. We used the heuristicword weighting classifier TFIDF (term frequency / inverse document frequency) to further shrink the feature vector. Final language model was built by interpolation of topic specific models and a general model. Experiments have been done by using English and Slovenian corpus.