Japanese lexical accent recognition for a CALL system by deriving classification equations with perceptual experiments

  • Authors:
  • Greg Short;Keikichi Hirose;Nobuaki Minematsu

  • Affiliations:
  • -;-;-

  • Venue:
  • Speech Communication
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

For non-native learners of Japanese, the pitch accent can be cumbersome to acquire without proper instruction. A Computer Assisted Language Learning (CALL) system could aid these learners in this acquisition provided that it can generate helpful feedback based on automatic analysis of the learner's utterance. For this, it is necessary to consider that the characteristics of a given learner's Japanese production will be largely influenced by his or her native tongue. For example, non-natives may produce pitch contours that natives do not produce. A standard approach to carry out recognition for error detection is to use a machine learning algorithm making use of an array composed of a variety of features. However, a method motivated by perceptual analysis may be better for a CALL system. With such a method, it should be possible to better understand the human recognition process and the causal relationships between contour and perception, which could be useful for feedback. Also, since accent recognition is a perceptual process, it may be possible to improve automatic recognition for non-native speech with such a method. Thus, we carry out listening tests making use of experiments using resynthesized speech to construct a method. First, we inspect which variables the probability of a pitch level transition is dependent on, and from this inspection, derive equations to calculate the probability at the disyllable level. Then, to recognize the word-level pattern, the location of each transition was determined from the probabilities for each two syllable pair. This method makes it possible to recognize all pitch patterns and to give more in-depth feedback. We conduct recognition experiments using these functions and achieve results that performed comparably to the inter-labeler agreement rate and outperformed SVM-based methods for non-native speech.