Fast online training with frequency-adaptive learning rates for Chinese word segmentation and new word detection

Authors:
Xu Sun;Houfeng Wang;Wenjie Li
Affiliations:
The Hong Kong Polytechnic University;Peking University, China;The Hong Kong Polytechnic University
Venue:
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Year:
2012

Citing 22
Cited 2

A statistical study of on-line learning

On-line learning in neural networks
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Unknown word extraction for Chinese documents

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Chinese unknown word identification using character-based tagging and chunking

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Statistically-enhanced new word identification in a rule-based Chinese system

CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Accelerated training of conditional random fields with stochastic gradient methods

ICML '06 Proceedings of the 23rd international conference on Machine learning
Chinese segmentation and new word detection using conditional random fields

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
Modeling latent-dynamic in shallow parsing: a latent conditional model with improved inference

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Sequential labeling with latent variables: an exact inference algorithm and its efficient approximation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
A hybrid Markov/semi-Markov conditional random field for sequence segmentation

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Subword-based tagging by conditional random fields for Chinese word segmentation

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
A discriminative latent variable chinese segmenter with hybrid word/character information

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Latent variable perceptron algorithm for structured classification

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Periodic step-size adaptation in second-order gradient descent for single-pass on-line structured learning

Machine Learning
A Unified Character-Based Tagging Framework for Chinese Word Segmentation

ACM Transactions on Asian Language Information Processing (TALIP)
Averaged Stochastic Gradient Descent with Feedback: An Accurate, Robust, and Fast Training Method

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Word-based and character-based word segmentation models: comparison and combination

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Large scale real-life action recognition using conditional random fields with stochastic training

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Chinese unknown word identification using class-based LM

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
A chunking strategy towards unknown word detection in chinese word segmentation

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Unknown word extraction from multilingual code-switching sentences

ROCLING '11 ROCLING 2011 Poster Papers

Unknown Chinese word extraction based on variety of overlapping strings

Information Processing and Management: an International Journal
Probabilistic Chinese word segmentation with non-local information and stochastic training

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a joint model for Chinese word segmentation and new word detection. We present high dimensional new features, including word-based features and enriched edge (label-transition) features, for the joint modeling. As we know, training a word segmentation system on large-scale datasets is already costly. In our case, adding high dimensional new features will further slow down the training speed. To solve this problem, we propose a new training method, adaptive online gradient descent based on feature frequency information, for very fast online training of the parameters, even given large-scale datasets with high dimensional features. Compared with existing training methods, our training method is an order magnitude faster in terms of training time, and can achieve equal or even higher accuracies. The proposed fast training method is a general purpose optimization method, and it is not limited in the specific task discussed in this paper.