Recent topics in speech recognition research at NTT laboratories

  • Authors:
  • Sadaoki Furui;Kiyohiro Shikano;Shoichi Matsunaga;Tatsuo Matsuoka;Satoshi Takahashi;Tomokazu Yamada

  • Affiliations:
  • NTT Human Interface Laboratories, Midori-cho, Musashino-shi, Tokyo, Japan;NTT Human Interface Laboratories, Midori-cho, Musashino-shi, Tokyo, Japan;NTT Human Interface Laboratories, Midori-cho, Musashino-shi, Tokyo, Japan;NTT Human Interface Laboratories, Midori-cho, Musashino-shi, Tokyo, Japan;NTT Human Interface Laboratories, Midori-cho, Musashino-shi, Tokyo, Japan;NTT Human Interface Laboratories, Midori-cho, Musashino-shi, Tokyo, Japan

  • Venue:
  • HLT '91 Proceedings of the workshop on Speech and Natural Language
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces three recent topics in speech recognition research at NTT (Nippon Telegraph and Telephone) Human Interface Laboratories.The first topic is a new HMM (hidden Markov model) technique that uses VQ-code bigrams to constrain the output probability distribution of the model according to the VQ-codes of previous frames. The output probability distribution changes depending on the previous frames even in the same state, so this method reduces the overlap of feature distributions with different phonemes.The second topic is approaches for adapting a syllable trigram model to a new task in Japanese continuous speech recognition. An approach which uses the most recent input phrases for adaptation is effective in reducing the perplexity and improving phrase recognition rates.The third topic is stochastic language models for sequences of Japanese characters to be used in a Japanese dictation system with unlimited vocabulary. Japanese characters consist of Kanji (Chinese characters) and Kana (Japanese alphabets), and each Kanji has several readings depending on the context. Our dictation system uses character-trigram probabilities as a source model obtained from a text database consisting of both Kanji and Kana, and generates Kanji-and-Kana sequences directly from input speech.