Using latent semantic analysis to improve access to textual information
CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Unsupervised learning of the morphology of a natural language
Computational Linguistics
Knowledge-free induction of morphology using latent semantic analysis
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Unsupervised learning of morphology using a novel directed search algorithm: taking the first step
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Unsupervised discovery of morphemes
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Morphemes as necessary concept for structures discovery from untagged corpora
NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
The minimum description length principle in coding and modeling
IEEE Transactions on Information Theory
A naive theory of affixation and an algorithm for extraction
SIGPHON '06 Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Combining MDL transliteration training with discriminative modeling
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Morphology induction from term clusters
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Fully unsupervised word segmentation with BVE and MDL
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Research on Language and Computation
Poor man’s stemming: unsupervised recognition of same-stem words
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
A regularized compression method to unsupervised word segmentation
SIGMORPHON '12 Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology
Hi-index | 0.00 |
Automatic word segmentation is a basic requirement for unsupervised learning in morphological analysis. In this paper, we formulate a novel recursive method for minimum description length (MDL) word segmentation, whose basic operation is resegmenting the corpus on a prefix (equivalently, a suffix). We derive a local expression for the change in description length under resegmentation, i.e., one which depends only on properties of the specific prefix (not on the rest of the corpus). Such a formulation permits use of a new and efficient algorithm for greedy morphological segmentation of the corpus in a recursive manner. In particular, our method does not restrict words to be segmented only once, into a stem+affix form, as do many extant techniques. Early results for English and Turkish corpora are promising.