Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

Authors:
Alex Graves;Santiago Fernández;Faustino Gomez;Jürgen Schmidhuber
Affiliations:
Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), Manno-Lugano, Switzerland;Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), Manno-Lugano, Switzerland;Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), Manno-Lugano, Switzerland;Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), Manno-Lugano, Switzerland and Technische Universität München (TUM), Garching, Munich, Germany
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 11
Cited 23

Long short-term memory

Neural Computation
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Connectionist Speech Recognition: A Hybrid Approach

Connectionist Speech Recognition: A Hybrid Approach
Fast curvature matrix-vector products for second-order gradient descent

Neural Computation
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Effiicient BackProp

Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop
Learning precise timing with lstm recurrent networks

The Journal of Machine Learning Research
Temporal classification: extending the classification paradigm to multivariate time series

Temporal classification: extending the classification paradigm to multivariate time series
2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures

Neural Networks - 2005 Special issue: IJCNN 2005
Bidirectional LSTM networks for improved phoneme classification and recognition

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Bidirectional recurrent neural networks

IEEE Transactions on Signal Processing

Applicability of feed-forward and recurrent neural networks to Boolean function complexity modeling

Expert Systems with Applications: An International Journal
Combining diverse on-line and off-line systems for handwritten text line recognition

Pattern Recognition
Self-training Strategies for Handwriting Word Recognition

ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Sequence labelling in structured domains with hierarchical recurrent neural networks

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Spurious valleys in the error surface of recurrent networks: analysis and avoidance

IEEE Transactions on Neural Networks
Multi-dimensional recurrent neural networks

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
An application of recurrent neural networks to discriminative keyword spotting

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
2006: celebrating 75 years of AI - history and outlook: the next 25 years

50 years of artificial intelligence
A brief survey on sequence classification

ACM SIGKDD Explorations Newsletter
Self-training for handwritten text line recognition

CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
Improving handwritten keyword spotting with self-training

Proceedings of the 2011 ACM Symposium on Applied Computing
Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario

ACM Transactions on Speech and Language Processing (TSLP)
A novel word spotting algorithm using bidirectional long short-term memory neural networks

ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
Combining neural networks to improve performance of handwritten keyword spotting

MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
A synthesised word approach to word retrieval in handwritten documents

Pattern Recognition
Keyword spotting exploiting Long Short-Term Memory

Speech Communication
Text recognition in videos using a recurrent connectionist approach

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

Computer Speech and Language
Normalizing historical orthography for OCR historical documents using LSTM

Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Can we build language-independent OCR using LSTM networks?

Proceedings of the 4th International Workshop on Multilingual OCR
Keyword spotting for self-training of BLSTM NN based handwriting recognition systems

Pattern Recognition
Neural network language models for off-line handwriting recognition

Pattern Recognition
Probabilistic speech feature extraction with context-sensitive Bottleneck neural networks

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require pre-segmented training data, and post-processing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN.