Bootstrapping morphological analyzers by combining human elicitation and machine learning

Authors:
Kemal Oflazer;Sergei Nirenburg;Marjorie McShane
Affiliations:
Sabanci University;New Mexico State University;New Mexico State University
Venue:
Computational Linguistics
Year:
2001

Citing 16
Cited 9

Regular models of phonological rule systems

Computational Linguistics - Special issue on computational phonology
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction

Computational Linguistics
A technique for computer detection and correction of spelling errors

Communications of the ACM
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
A Rational Design for a Weighted Finite-State Transducer Library

WIA '97 Revised Papers from the Second International Workshop on Implementing Automata
An Extendible Regular Expression Compiler for Finite-State Approaches in Natural Language Processing

WIA '99 Revised Papers from the 4th International Workshop on Automata Implementation
Multitiered nonlinear morphology using multitape finite automata: a case study on Syriac and Arabic

Computational Linguistics - Special issue on finite-state methods in NLP
Automatic acquisition of two-level morphological rules

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Regular expressions for language engineering

Natural Language Engineering
String transformation learning

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A discovery procedure for certain phonological rules

ACL '84 Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics
Constructing lexical transducers

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Two-level morphology with composition

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Arabic finite-state morphological analysis and generation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
A multilingual natural-language interface to regular expressions

FSMNLP '09 Proceedings of the International Workshop on Finite State Methods in Natural Language Processing

Mood and modality: out of theory and into the fray

Natural Language Engineering
Parameterizing and Eliciting Text Elements across Languages for Use in Natural Language Processing Systems

Machine Translation
Enriching the class diagram concepts to capture natural language semantics for database access

Data & Knowledge Engineering
Guessers for Finite-State Transducer Lexicons

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Multilingual noise-robust supervised morphological analysis using the WordFrame model

SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Learning probabilistic paradigms for morphology in a latent class model

SIGPHON '06 Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology
Enhanced word decomposition by calibrating the decision threshold of probabilistic models and using a model ensemble

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Weakly supervised morphology learning for agglutinating languages using small training sets

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Investigating the Relationship Between Linguistic Representation and Computation through an Unsupervised Model of Human Morphology Learning

Research on Language and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a semiautomatic technique for developing broad-coverage finite-state mor-phological analyzers for use in natural language processing applications. It consists of three components---elicitation of linguistic information from humans, a machine learning bootstrapping scheme, and a testing environment. The three components are applied iteratively until a threshold of output quality is attained. The initial application of this technique is for the morphology of low-density languages in the context of the Expedition project at NMSU Computing Research Laboratory. This elicit-build-test technique compiles lexical and inflectional information elicited from a human into a finite-state transducer lexicon and combines this with a sequence of morphographemic rewrite rules that is induced using transformation-based learning from the elicited examples. The resulting morphological analyzer is then tested against a test set, and any corrections are fed back into the learning procedure, which then builds an improved analyzer.