Speech interface VLSI for car applications

Authors:
M. Shozakai
Affiliations:
LSI Labs., Asahi Chem. Ind. Co. Ltd., Kanagawa, Japan
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Year:
1999

Citing 0
Cited 4

Usability of Browser-Based Pen-Touch/Speech User Interfaces for Form-Based Application in Mobile Environment

ICMI '00 Proceedings of the Third International Conference on Advances in Multimodal Interfaces
Speech Recognition on an FPGA Using Discrete and Continuous Hidden Markov Models

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Iterative training techniques for phonetic template based speech recognition with a speaker-independent phonetic recognizer

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Unsupervised speaker adaptation for phonetic transcription based voice dialing

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

A user-friendly speech interface for car applications is highly needed for safety reasons. This paper describes a speech interface VLSI designed for car environments, with speech recognition and speech compression/decompression functions. The chip has a heterogeneous architecture composed of ADC/DAC, DSP, RISC, hard-wired logic and peripheral circuits. The DSP not only executes acoustic analysis and output probability calculation of HMMs for speech recognition, but also does speech compression/decompression. On the other hand, the RISC works as a CPU of the whole chip and Viterbi decoder with an aid of hard-wired logic. An algorithm to recognize a mixed vocabulary of speaker-independent fixed words and speaker-dependent user-defined words in a seamless way is proposed. It is based on acoustic event HMMs which enable a template creation from one sample utterance. The proposed algorithm embedded in the chip is evaluated. Promising results of the algorithm for multiple languages are shown.