Class-based n-gram models of natural language
Computational Linguistics
Guessing morphology from terms and corpora
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Unsupervised learning of the morphology of a natural language
Computational Linguistics
Word association norms, mutual information, and lexicography
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Wordform- and class-based prediction of the components of German nominal compounds in an AAC system
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A Bayesian model for morpheme and paradigm identification
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Minimally supervised morphological analysis by multimodal alignment
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Knowledge-free induction of morphology using latent semantic analysis
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Probabilistic term variant generator for biomedical terms
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Unsupervised segmentation of words using prior distributions of morph length and frequency
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Modeling english past tense intuitions with minimal generalization
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Unsupervised models for morpheme segmentation and morphology learning
ACM Transactions on Speech and Language Processing (TSLP)
Efficient unsupervised recursive word segmentation using minimum description length
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Division of Spanish Words into Morphemes with a Genetic Algorithm
NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Unsupervised discovery of Persian morphemes
EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Induction of a simple morphology for highly-inflecting languages
SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Multilingual noise-robust supervised morphological analysis using the WordFrame model
SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Morphology induction from limited noisy data using approximate string matching
SIGPHON '06 Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology
A naive theory of affixation and an algorithm for extraction
SIGPHON '06 Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology
TextGraphs-3 Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing
Exploiting long distance collocational relations in predictive typing
TextEntry '03 Proceedings of the 2003 EACL Workshop on Language Modeling for Text Entry Methods
Unsupervised morphological segmentation and clustering with document boundaries
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Morphology induction from term clusters
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Allomorfessor: towards unsupervised morpheme analysis
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Predicting the semantic compositionality of prefix verbs
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
MorphoNet: exploring the use of community structure for unsupervised morpheme analysis
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Morphological analysis by multiple sequence alignment
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Poor man’s stemming: unsupervised recognition of same-stem words
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Discovering morphological paradigms from plain text using a Dirichlet process mixture model
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Arabic retrieval revisited: morphological hole filling
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Effective and Robust Query-Based Stemming
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
We present an algorithm that takes an unannotated corpus as its input, and returns a ranked list of probable morphologically related pairs as its output. The algorithm tries to discover morphologically related pairs by looking for pairs that are both orthographically and semantically similar, where orthographic similarity is measured in terms of minimum edit distance, and semantic similarity is measured in terms of mutual information. The procedure does not rely on a morpheme concatenation model, nor on distributional properties of word substrings (such as affix frequency). Experiments with German and English input give encouraging results, both in terms of precision (proportion of good pairs found at various cutoff points of the ranked list), and in terms of a qualitative analysis of the types of morphological patterns discovered by the algorithm.