Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Authors:
Xuedong Huang;Alex Acero;Hsiao-Wuen Hon;Raj Reddy
Affiliations:
Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA;-
Venue:
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Year:
2001

Citing 0
Cited 176

Strategies for Developing a Real-Time Continuous Speech Recognition System for Czech Language

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
On-line Signature Verification Using Local Shape Analysis

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Challenges in adopting speech recognition

Communications of the ACM - Multimodal interfaces that flex, adapt, and persist
Considerations in the usage of text to speech (TTS) in the creation of natural sounding voice enabled web systems

ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
An xpath-based discourse analysis module for spoken dialogue systems

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Research and developments of a multi-modal MIR engine for commercial applications in East Asia

Journal of the American Society for Information Science and Technology - Music information retrieval
Transferable videorealistic speech animation

Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation
Inferring body pose using speech content

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Accurate Visible Speech Synthesis Based on Concatenating Variable Length Motion Capture Data

IEEE Transactions on Visualization and Computer Graphics
Bootstrapping Named Entity recognition for Italian Broadcast News

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Voice activated command and control with speech recognition over WiFi

Science of Computer Programming - Special issue: Principles and practices of programming in Java (PPPJ 2004)
Speaker-independent 3D face synthesis driven by speech and text

Signal Processing - Fractional calculus applications in signals and systems
An active approach to spoken language processing

ACM Transactions on Speech and Language Processing (TSLP)
Robust scene recognition using language models for scene contexts

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Word graph based speech rcognition error correction by handwriting input

Proceedings of the 8th international conference on Multimodal interfaces
A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA

Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
Chester: towards a personal medication advisor

Journal of Biomedical Informatics - Special issue: Dialog systems for health communications
The vocal joystick: a voice-based human-computer interface for individuals with motor impairments

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Pattern recognition methods for advanced stochastic protein sequence analysis using HMMs

Pattern Recognition
Accessing speech data using strategic fixation

Computer Speech and Language
Multimodal speaker/speech recognition using lip motion, lip texture and audio

Signal Processing - Special section: Multimodal human-computer interfaces
Optimization of Speech Recognition by Clustering of Phones

Fundamenta Informaticae - SPECIAL ISSUE ON CONCURRENCY SPECIFICATION AND PROGRAMMING (CS&P 2005) Ruciane-Nide, Poland, 28-30 September 2005
Chirp group delay analysis of speech signals

Speech Communication
Statistical query translation models for cross-language information retrieval

ACM Transactions on Asian Language Information Processing (TALIP)
Robust in-car speech recognition based on nonlinear multiple regressions

EURASIP Journal on Applied Signal Processing
Robust speech recognition using factorial HMMs for home environments

EURASIP Journal on Applied Signal Processing
Compensating acoustic mismatch using class-based histogram equalization for robust speech recognition

EURASIP Journal on Applied Signal Processing
A novel speech/noise discrimination method for embedded ASR system

EURASIP Journal on Applied Signal Processing
A model-selection-based self-splitting Gaussian mixture learning with application to speaker identification

EURASIP Journal on Applied Signal Processing
Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech

Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility
Temporal filtering of visual speech for audio-visual speech recognition in acoustically and visually challenging environments

Proceedings of the 9th international conference on Multimodal interfaces
A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition

Speech Communication
Testing the performance of spoken dialogue systems by means of an artificially simulated user

Artificial Intelligence Review
Relative pitch tracking for singing voice as an application in query by humming systems

SPPR'07 Proceedings of the Fourth conference on IASTED International Conference: Signal Processing, Pattern Recognition, and Applications
Adapting speaking after evidence of misrecognition: Local and global hyperarticulation

Speech Communication
Review: Speaker segmentation and clustering

Signal Processing
Understanding speech utterances in mandarin dialogue system

CEA'07 Proceedings of the 2007 annual Conference on International Conference on Computer Engineering and Applications
Ontology-based speech act identification in a bilingual dialog system using partial pattern trees

Journal of the American Society for Information Science and Technology
Tone-enhanced generalized character posterior probability (GCPP) for Cantonese LVCSR

Computer Speech and Language
Software for speech analysis

ICCOM'05 Proceedings of the 9th WSEAS International Conference on Communications
Limited-Vocabulary Estonian Continuous Speech Recognition System using Hidden Markov Models

Informatica
Acoustic Modelling for Croatian Speech Recognition and Synthesis

Informatica
Automated dialog systems for Romanian language

MMACTEE'08 Proceedings of the 10th WSEAS International Conference on Mathematical Methods and Computational Techniques in Electrical Engineering
Extended probabilistic HAL with close temporal association for psychiatric query document retrieval

ACM Transactions on Information Systems (TOIS)
An Improve to Human Computer Interaction, Recovering Data from Databases Through Spoken Natural Language

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Part II--Advances in Neural Networks
User-Customized Interactive System Using Both Speech and Face Recognition

ICESS '07 Proceedings of the 3rd international conference on Embedded Software and Systems
Speaker Recognition Using Temporal Decomposition of LSF for Mobile Environment

ICESS '07 Proceedings of the 3rd international conference on Embedded Software and Systems
Ubiquitous and Robust Text-Independent Speaker Recognition for Home Automation Digital Life

UIC '08 Proceedings of the 5th international conference on Ubiquitous Intelligence and Computing
The ASRS_RL --- A Research Platform for Spoken Language Recognition and Understanding Experiments

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Comparing Linear Feature Space Transformations for Correlated Features

PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
Boundary Refining Aiming at Speech Synthesis Applications

PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Evolutionary-Based Design of a Brazilian Portuguese Recording Script for a Concatenative Synthesis System

PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Evaluation of the robustness of the polynomial segment models to noisy environments with unsupervised adaptation

Speech Communication
Role recognition in multiparty recordings using social affiliation networks and discrete distributions

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition

Computer Speech and Language
Word and triphone based approaches in continuous speech recognition for Tamil language

WSEAS Transactions on Signal Processing
Robust Romanian language automatic speech recognizer based on multistyle training

WSEAS Transactions on Computer Research
Fast communication: Bernoulli versus Markov: Investigation of state transition regime in switching-state acoustic models

Signal Processing
User Verification by Combining Speech and Face Biometrics in Video

ISVC '08 Proceedings of the 4th International Symposium on Advances in Visual Computing, Part II
Computational method for segmentation and classification of ingestive sounds in sheep

Computers and Electronics in Agriculture
A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Improving GMM-UBM speaker verification using discriminative feedback adaptation

Computer Speech and Language
Substring Statistics

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Research on Segment Acoustic Model Based Mandarin LVCSR

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
The e-Sentencias Prototype: A Procedural Ontology for Legal Multimedia Applications in the Spanish Civil Courts

Proceedings of the 2009 conference on Law, Ontologies and the Semantic Web: Channelling the Legal Information Flood
Acoustic Features Analysis for Recognition of Normal and Hypoacustic Infant Cry Based on Neural Networks

IWANN '03 Proceedings of the 7th International Work-Conference on Artificial and Natural Neural Networks: Part II: Artificial Neural Nets Problem Solving Methods
An overview of spoken language technology for education

Speech Communication
Is text-to-speech synthesis ready for use in computer-assisted language learning?

Speech Communication
Multilingual speech recognition for information retrieval in Indian context

HLT-SRWS '04 Proceedings of the Student Research Workshop at HLT-NAACL 2004
Efficient Parsing of Romanian Language for Text-to-Speech Purposes

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Continuous speech recognition using modified stack decoding algorithm

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Automatic decision of piano fingering based on hidden Markov models

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
Speech recognition using augmented conditional random fields

IEEE Transactions on Audio, Speech, and Language Processing
Emphatic visual speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Graph-based partial hypothesis fusion for pen-aided speech input

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Time-frequency correlation-based missing-feature reconstruction for robust speech recognition in band-restricted conditions

IEEE Transactions on Audio, Speech, and Language Processing
Stereo-based stochastic mapping for robust speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model

ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part I
Rethinking of computation for future-generation, knowledge-rich speech recognition and understanding

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Relative pitch tracking for singing voice as an application in query by humming systems

SPPRA '07 Proceedings of the Fourth IASTED International Conference on Signal Processing, Pattern Recognition, and Applications
Text dependent speaker verification system using discriminative weighting method and Artificial Neural Networks

AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
Speech watermarking for analog flat-fading bandpass channels

IEEE Transactions on Audio, Speech, and Language Processing
Classification of non-speech human sounds: feature selection and snoring sound analysis

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Spoken information extraction from Italian broadcast news

ECIR'03 Proceedings of the 25th European conference on IR research
Using morphossyntactic information in TTS systems: comparing strategies for European Portuguese

PROPOR'03 Proceedings of the 6th international conference on Computational processing of the Portuguese language
A new HMM-based ensemble generation method for numeral recognition

MCS'07 Proceedings of the 7th international conference on Multiple classifier systems
Flexible multi-modal interaction technologies and user interface specially designed for chinese car infotainment system

HCI'07 Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments
Exploring web scale language models for search query processing

Proceedings of the 19th international conference on World wide web
Pitch marks at peaks or valleys?

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
The significance of empty speech pauses: cognitive and algorithmic issues

BVAI'07 Proceedings of the 2nd international conference on Advances in brain, vision and artificial intelligence
Emotional style conversion in the TTS system with cepstral description

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
Cocktail party processing

WCCI'08 Proceedings of the 2008 IEEE world conference on Computational intelligence: research frontiers
On the inference of ancestries in admixed populations

RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
Using knowledge of misunderstandings to increase the robustness of spoken dialogue systems

Knowledge-Based Systems
Ergodic HMM-UBM system for on-line signature verification

BioID_MultiComm'09 Proceedings of the 2009 joint COST 2101 and 2102 international conference on Biometric ID management and multimodal communication
A method for Vietnamese text normalization to improve the quality of speech synthesis

Proceedings of the 2010 Symposium on Information and Communication Technology
Instance-based natural language generation

Natural Language Engineering
Latvian Text-to-Speech Synthesizer

Proceedings of the 2010 conference on Human Language Technologies -- The Baltic Perspective: Proceedings of the Fourth International Conference Baltic HLT 2010
A real-time FPGA-based 20 000-word speech recognizer with optimized DRAM access

IEEE Transactions on Circuits and Systems Part I: Regular Papers
Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Human augmented cognition based on integration of visual and auditory information

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Improving phone duration modelling using support vector regression fusion

Speech Communication
The integration of principal component analysis and cepstral mean subtraction in parallel model combination for robust speech recognition

Digital Signal Processing
Visual speech synthesis by modelling coarticulation dynamics using a non-parametric switching state-space model

International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
A tandem algorithm for pitch estimation and voiced speech segregation

IEEE Transactions on Audio, Speech, and Language Processing
Performing nonlinear blind source separation with signal invariants

IEEE Transactions on Signal Processing
Histogram equalization to model adaptation for robust speech recognition

EURASIP Journal on Advances in Signal Processing
Inferring search behaviors using partially observable markov model with duration (POMD)

Proceedings of the fourth ACM international conference on Web search and data mining
Role of modulation magnitude and phase spectrum towards speech intelligibility

Speech Communication
Large vocabulary continuous speech recognition for Urdu

Proceedings of the 8th International Conference on Frontiers of Information Technology
An utterance recognition technique for keyword spotting by fusion of bark energy and MFCC features

SSIP '09/MIV'09 Proceedings of the 9th WSEAS international conference on signal, speech and image processing, and 9th WSEAS international conference on Multimedia, internet & video technologies
On the detection of pitch marks using a robust multi-phase algorithm

Speech Communication
The use of phase in complex spectrum subtraction for robust speech recognition

Computer Speech and Language
Robust speech recognition using spatial-temporal feature distribution characteristics

Pattern Recognition Letters
Automatic detection of edited parts in inexact transcribed corpora based on alignment between edited transcription and corresponding utterance

ROCOM'11/MUSP'11 Proceedings of the 11th WSEAS international conference on robotics, control and manufacturing technology, and 11th WSEAS international conference on Multimedia systems & signal processing
Development of a method for automatic basso continuo playing

Information Processing and Management: an International Journal
Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition

Journal of Signal Processing Systems
Contextual invariant-integration features for improved speaker-independent speech recognition

Speech Communication
Robust Romanian language automatic speech recognizer

CIMMACS'07 Proceedings of the 6th WSEAS international conference on Computational intelligence, man-machine systems and cybernetics
Eigen-model projections for protected on-line signature recognition

BioID'11 Proceedings of the COST 2101 European conference on Biometrics and ID management
Robust speech recognition in the car environment

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Neuromorphic detection of vowel representation spaces

IWINAC'11 Proceedings of the 4th international conference on Interplay between natural and artificial computation: new challenges on bioinspired applications - Volume Part II
Impact of the approaches involved on word-graph derivation from the ASR system

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Speech compression based on frequency warped cepstrum and wavelet analysis

MCPR'11 Proceedings of the Third Mexican conference on Pattern recognition
Gaussian selection using self-organizing map for automatic speech recognition

WSOM'11 Proceedings of the 8th international conference on Advances in self-organizing maps
A bag-of-features-based framework for human activity representation and recognition

Proceedings of the 2011 international workshop on Situation activity & goal awareness
A Computational Model of Unsupervised Speech Segmentation for Correspondence Learning

Research on Language and Computation
Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I)

International Journal of Speech Technology
Classification of speech dysfluencies with MFCC and LPCC features

Expert Systems with Applications: An International Journal
Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator

Speech Communication
Bio-inspired phonologic processing: from vowel representation spaces to categories

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Robust text-independent speaker identification using hybrid PCA&LDA

MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Using PCA to improve the generation of speech keys

MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
A straightforward method for automatic identification of marginalized languages

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Speaker recognition in unknown mismatched conditions using augmented PCA

ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
Motion primitive-based human activity recognition using a bag-of-features approach

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Heterogeneous centroid neural networks

ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I
Pitch mean based frequency warping

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Consistent modeling of the static and time-derivative cepstrums for speech recognition using HSPTM

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Bayesian on-line spectral change point detection: a soft computing approach for on-line ASR

International Journal of Speech Technology
A survey of techniques for incremental learning of HMM parameters

Information Sciences: an International Journal
Phonetic sequence to graphemes conversion based on DTW and one-stage algorithms

PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
A new hybrid and dynamic fusion of multiple experts for intelligent porch system

Expert Systems with Applications: An International Journal
Two-stage phone duration modelling with feature construction and feature vector extension for the needs of speech synthesis

Computer Speech and Language
Brazilian Portuguese speech-driven answering system

Proceedings of the 6th Euro American Conference on Telematics and Information Systems
Classification of Speech Dysfluencies Using LPC Based Parameterization Techniques

Journal of Medical Systems
Incremental word learning: Efficient HMM initialization and large margin discriminative adaptation

Speech Communication
Automatic recognition of ingestive sounds of cattle based on hidden Markov models

Computers and Electronics in Agriculture
Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval
Optimization of Speech Recognition by Clustering of Phones

Fundamenta Informaticae - SPECIAL ISSUE ON CONCURRENCY SPECIFICATION AND PROGRAMMING (CS&P 2005) Ruciane-Nide, Poland, 28-30 September 2005
ICMI'12 grand challenge: haptic voice recognition

Proceedings of the 14th ACM international conference on Multimodal interaction
Desenvolvendo soluções com interface baseada em voz

Companion Proceedings of the 11th Brazilian Symposium on Human Factors in Computing Systems
Stereo hidden Markov modeling for noise robust speech recognition

Computer Speech and Language
Average framing linear prediction coding with wavelet transform for text-independent speaker identification system

Computers and Electrical Engineering
A speaker recognition based approach for identifying voice spammer

WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Video search and indexing with reinforcement agent for interactive multimedia services

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on embedded systems for interactive multimedia services (ES-IMS)
Fast Likelihood Computation in Speech Recognition using Matrices

Journal of Signal Processing Systems
Objective evaluation of speech dysfluencies using wavelet packet transform with sample entropy

Digital Signal Processing
Multiple cameras for audio-visual speech recognition in an automotive environment

Computer Speech and Language
Joint training of non-negative Tucker decomposition and discrete density hidden Markov models

Computer Speech and Language
Isolated Word Speech Rcogniton Based on HRSF and Improved DTW Algorithm

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Cry-based classification of healthy and sick infants using adapted boosting mixture learning method for gaussian mixture models

Modelling and Simulation in Engineering
VLSI design of an SVM learning core on sequential minimal optimization algorithm

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Rapid speaker adaptation in latent speaker space with non-negative matrix factorization

Speech Communication
A historical perspective of speech recognition

Communications of the ACM
Autoregressive modeling of speech trajectory transformed to the reconstructed phase space for ASR purposes

Digital Signal Processing
A smartphone-based ASR data collection tool for under-resourced languages

Speech Communication
An educational platform to demonstrate speech processing techniques on Android based smart phones and tablets

Speech Communication
Low bit rate compression methods of feature vectors for distributed speech recognition

Speech Communication
A fast maximum likelihood nonlinear feature transformation method for GMM-HMM speaker adaptation

Neurocomputing
An overview of digital speech watermarking

International Journal of Speech Technology
Glissando: a corpus for multidisciplinary prosodic studies in Spanish and Catalan

Language Resources and Evaluation
A MAP-based Online Estimation Approach to Ensemble Speaker and Speaking Environment Modeling

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Automatic music transcription: challenges and future directions

Journal of Intelligent Information Systems
Recognizing Young Readers' Spoken Questions

International Journal of Artificial Intelligence in Education

Quantified Score

Hi-index	0.02

Visualization

Abstract

From the Publisher: New advances in spoken language processing: theory and practice In-depth coverage of speech processing, speech recognition, speech synthesis, spoken language understanding, and speech interface design Many case studies from state-of-the-art systems, including examples from Microsoft's advanced research labs Spoken Language Processing draws on the latest advances and techniques from multiple fields: computer science, electrical engineering, acoustics, linguistics, mathematics, psychology, and beyond. Starting with the fundamentals, it presents all this and more: Essential background on speech production and perception, probability and information theory, and pattern recognition Extracting information from the speech signal: useful representations and practical compression solutions Modern speech recognition techniques: hidden Markov models, acoustic and language modeling, improving resistance to environmental noises, search algorithms, and large vocabulary speech recognition Text-to-speech: analyzing documents, pitch and duration controls; trainable synthesis, and more Spoken language understanding: dialog management, spoken language applications, and multimodal interfaces To illustrate the book's methods, the authors present detailed case studies based on state-of-the-art systems, including Microsoft's Whisper speech recognizer, Whistler text-to-speech system, Dr. Who dialog system, and the MiPad handheld device. Whether you're planning, designing, building, or purchasing spoken language technology, this is the state of the artfromalgorithms through business productivity.