On learning the past tenses of English verbs
Parallel distributed processing: explorations in the microstructure of cognition, vol. 2
Some advances in transformation-based part of speech tagging
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
IGTree: Using Trees for Compression and Classification in Lazy LearningAlgorithms
Artificial Intelligence Review - Special issue on lazy learning
An Efficient, Probabilistically Sound Algorithm for Segmentation andWord Discovery
Machine Learning - Special issue on natural language learning
Handbook of Natural Language Processing
Handbook of Natural Language Processing
The Unsupervised Acquisition of a Lexicon from Continuous Speech
The Unsupervised Acquisition of a Lexicon from Continuous Speech
Automatic rule induction for unknown-word guessing
Computational Linguistics
Improving Chinese tokenization with linguistic filters on statistical lexical acquisition
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A stochastic finite-state word-segmentation algorithm for Chinese
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A Mathematical Theory of Communication
A Mathematical Theory of Communication
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
COMA: a system for flexible combination of schema matching approaches
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Boosting-based ensemble learning with penalty profiles for automatic Thai unknown word recognition
Computers & Mathematics with Applications
Hi-index | 0.00 |
In this paper, we formulate a generalized method of automatic word segmentation. The method uses corpus type frequency information to choose the type with maximum length and frequency from "desegmented" text. It also uses a modified forward-backward matching technique using maximum length frequency and entropy rate if any non-matching portions of the text exist. The method is also extendible to a dictionary-based or hybrid method with some additions to the algorithms. Evaluation results show that our method outperforms several competing methods.