C4.5: programs for machine learning
C4.5: programs for machine learning
The power of amnesia: learning probabilistic automata with variable memory length
Machine Learning - Special issue on COLT '94
Statistical Language Learning
Machine Learning
A Boosted Maximum Entropy Model for Learning Text Chunking
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A neural probabilistic language model
The Journal of Machine Learning Research
Part-of-speech tagging using a Variable Memory Markov model
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
An empirical study of smoothing techniques for language modeling
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Self-organizing Markov models and their application to part-of-speech tagging
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Automatic word spacing using hidden Markov model for refining Korean text corpora
COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
Detecting errors in discontinuous structural annotation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Dependency Analysis of Clauses Using Parse Tree Kernels
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Automatic word spacing of erroneous sentences in mobile devices with limited hardware resources
Information Processing and Management: an International Journal
Hi-index | 0.00 |
An automatic word spacing is one of the important tasks in Korean language processing and information retrieval. Since there are a number of confusing cases in word spacing of Korean, there are some mistakes in many texts including news articles. This paper presents a high-accurate method for automatic word spacing based on self-organizing η-gram model. This method is basically a variant of η-gram model, but achieves high accuracy by automatically adapting context size.In order to find the optimal context size, the proposed method automatically increases the context size when the contextual distribution after increasing it dose not agree with that of the current context. It also decreases the context size when the distribution of reduced context is similar to that of the current context. This approach achieves high accuracy by considering higher dimensional data in case of necessity, and the increased computational cost are compensated by the reduced context size. The experimental results show that the self-organizing structure of η-gram model enhances the basic model.