Unsupervised learning of the morphology of a natural language
Computational Linguistics
Introduction to the CoNLL-2000 shared task: chunking
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Comparing and combining finite-state and context-free parsers
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Simple Morpheme Labelling in Unsupervised Morpheme Analysis
Advances in Multilingual and Multimodal Information Retrieval
Unsupervised multilingual learning for POS tagging
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised morphological segmentation with log-linear models
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Linear complexity context-free parsing pipelines via chart constraints
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Hunmorph: open source word analysis
Software '05 Proceedings of the Workshop on Software
Exploring different representational units in English-to-Turkish statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Overview of Morpho challenge 2008
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
ParaMor and Morpho challenge 2008
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Paramor: from paradigm structure to natural language morphology induction
Paramor: from paradigm structure to natural language morphology induction
Morphological analysis by multiple sequence alignment
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Hi-index | 0.00 |
We propose a method for providing stochastic confidence estimates for rule-based and black-box natural language (NL) processing systems. Our method does not require labeled training data: We simply train stochastic models on the output of the original NL systems. Numeric confidence estimates enable both minimum Bayes risk-style optimization as well as principled system combination for these knowledge-based and black-box systems. In our specific experiments, we enrich ParaMor, a rule-based system for unsupervised morphology induction, with probabilistic segmentation confidences by training a statistical natural language tagger to simulate ParaMor's morphological segmentations. By adjusting the numeric threshold above which the simulator proposes morpheme boundaries, we improve F1 of morpheme identification on a Hungarian corpus by 5.9% absolute. With numeric confidences in hand, we also combine ParaMor's segmentation decisions with those of a second (blackbox) unsupervised morphology induction system, Morfessor. Our joint ParaMor-Morfessor system enhances F1 performance by a further 3.4% absolute, ultimately moving F1 from 41.4% to 50.7%.