Unsupervised models for morpheme segmentation and morphology learning
ACM Transactions on Speech and Language Processing (TSLP)
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
An unsupervised morpheme-based HMM for hebrew morphological disambiguation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Contextual dependencies in unsupervised word segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Cross-lingual propagation for morphological analysis
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Unsupervised morphological segmentation with log-linear models
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Unsupervised word segmentation for Sesotho using Adaptor Grammars
SigMorPhon '08 Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational Morphology and Phonology
A global model for joint lemmatization and part-of-speech prediction
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Simple type-level unsupervised POS tagging
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Unsupervised morphology rivals supervised morphology for Arabic MT
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Hi-index | 0.00 |
The connection between part-of-speech (POS) categories and morphological properties is well-documented in linguistics but underutilized in text processing systems. This paper proposes a novel model for morphological segmentation that is driven by this connection. Our model learns that words with common affixes are likely to be in the same syntactic category and uses learned syntactic categories to refine the segmentation boundaries of words. Our results demonstrate that incorporating POS categorization yields substantial performance gains on morphological segmentation of Arabic.