A naive theory of affixation and an algorithm for extraction

Authors:
Harald Hammarström
Affiliations:
Chalmers University of Technology, Gothenburg Sweden
Venue:
SIGPHON '06 Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology
Year:
2006

Citing 21
Cited 4

Guessing morphology from terms and corpora

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised learning of word segmentation rules with genetic algorithms and inductive logic programming

Machine Learning - Special issue on inducive logic programming
A Hybrid Approach t Word Segmentation

ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
Automatic Language-Specific Stemming in Information Retrieval

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Modeling and learning multilingual inflectional morphology in a minimally supervised framework

Modeling and learning multilingual inflectional morphology in a minimally supervised framework
Unsupervised learning of the morphology of a natural language

Computational Linguistics
A Bayesian model for morpheme and paradigm identification

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Knowledge-free induction of inflectional morphologies

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Unsupervised learning of morphology for English and Inuktitut

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Minimally supervised morphological analysis by multimodal alignment

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Unsupervised learning of morphology for building lexicon for a highly inflectional language

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Unsupervised learning of morphology using a novel directed search algorithm: taking the first step

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Unsupervised discovery of morphemes

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Unsupervised learning of morphology without morphemes

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Unsupervised discovery of morphologically related words based on orthographic and semantic similarity

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Efficient unsupervised recursive word segmentation using minimum description length

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
The segmentation problem in morphology learning

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Induction of a simple morphology for highly-inflecting languages

SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Multilingual noise-robust supervised morphological analysis using the WordFrame model

SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Using morphology and syntax together in unsupervised learning

PMHLA '05 Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition
The SED heuristic for morpheme discovery: a look at Swahili

PMHLA '05 Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition

Language identification of search engine queries

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Unsupervised morphological segmentation and clustering with document boundaries

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Poor man’s stemming: unsupervised recognition of same-stem words

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Optimal stem identification in presence of suffix list

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel approach to the unsupervised detection of affixes, that is, to extract a set of salient prefixes and suffixes from an unlabeled corpus of a language. The underlying theory makes no assumptions on whether the language uses a lot of morphology or not, whether it is prefixing or suffixing, or whether affixes are long or short. It does however make the assumption that 1. salient affixes have to be frequent, i.e occur much more often that random segments of the same length, and that 2. words essentially are variable length sequences of random characters, e.g a character should not occur in far too many words than random without a reason, such as being part of a very frequent affix. The affix extraction algorithm uses only information from fluctation of frequencies, runs in linear time, and is free from thresholds and untransparent iterations. We demonstrate the usefulness of the approach with example case studies on typologically distant languages.