Unsupervised learning of the morphology of a natural language
Computational Linguistics
Knowledge-free induction of inflectional morphologies
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Unsupervised models for morpheme segmentation and morphology learning
ACM Transactions on Speech and Language Processing (TSLP)
Contrastive estimation: training log-linear models on unlabeled data
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A hybrid Markov/semi-Markov conditional random field for sequence segmentation
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Joint unsupervised coreference resolution with Markov logic
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised multilingual learning for POS tagging
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Cross-lingual propagation for morphological analysis
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Using a maximum entropy model to build segmentation lattices for MT
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Joint inference for knowledge extraction from biomedical literature
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Unsupervised search for the optimal segmentation for statistical machine translation
ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
Semi-supervised learning of concatenative morphology
SIGMORPHON '10 Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology
Predicting the semantic compositionality of prefix verbs
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
EMMA: a novel Evaluation Metric for Morphological Analysis
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Simulating morphological analyzers with stochastic taggers for confidence estimation
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Learning sub-word units for open vocabulary speech recognition
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Modeling syntactic context improves morphological segmentation
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Research on Language and Computation
Universal morphological analysis using structured nearest neighbor prediction
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast generation of translation forest for large-scale SMT discriminative training
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatic event extraction with structured preference modeling
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Unsupervised morphology rivals supervised morphology for Arabic MT
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
The study of effect of length in morphological segmentation of agglutinative languages
MM '12 Proceedings of the First Workshop on Multilingual Modeling
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Hi-index | 0.00 |
Morphological segmentation breaks words into morphemes (the basic semantic units). It is a key component for natural language processing systems. Unsupervised morphological segmentation is attractive, because in every language there are virtually unlimited supplies of text, but very few labeled resources. However, most existing model-based systems for unsupervised morphological segmentation use directed generative models, making it difficult to leverage arbitrary overlapping features that are potentially helpful to learning. In this paper, we present the first log-linear model for unsupervised morphological segmentation. Our model uses overlapping features such as morphemes and their contexts, and incorporates exponential priors inspired by the minimum description length (MDL) principle. We present efficient algorithms for learning and inference by combining contrastive estimation with sampling. Our system, based on monolingual features only, outperforms a state-of-the-art system by a large margin, even when the latter uses bilingual information such as phrasal alignment and phonetic correspondence. On the Arabic Penn Treebank, our system reduces F1 error by 11% compared to Morfessor.