Using eigenvectors of the bigram graph to infer morpheme identity

Authors:
Mikhail Belkin;John Goldsmith
Affiliations:
University of Chicago, Chicago, IL;University of Chicago, Chicago, IL
Venue:
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Year:
2002

Citing 3
Cited 5

Class-based n-gram models of natural language

Computational Linguistics
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised learning of the morphology of a natural language

Computational Linguistics

Genetic algorithms for data-driven web question answering

Evolutionary Computation
Soft Uncoupling of Markov chains for Permeable Language Distinction: A New Algorithm

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Discovering global patterns in linguistic networks through spectral analysis: a case study of the consonant inventories

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Syntax is from Mars while semantics from Venus!: insights from spectral analysis of distributional similarity networks

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Global topology of word co-occurrence networks: beyond the two-regime power-law

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the results of some experiments exploring statistical methods to infer syntactic categories from a raw corpus in an unsupervised fashion. It shares certain points in common with Brown et at (1992) and work that has grown out of that: it employs statistical techniques to derive categories based on what words occur adjacent to a given word. However, we use an eigenvector decomposition of a nearest-neighbor graph to produce a two-dimensional rendering of the words of a corpus in which words of the same syntactic category tend to form clusters and neighborhoods. We exploit this technique for extending the value of automatic learning of morphology. In particular, we look at the suffixes derived from a corpus by unsupervised learning of morphology, and we ask which of these suffixes have a consistent syntactic function (e.g., in English, -ed is primarily a mark of verbal past tense, does but -s marks both noun plurals and 3rd person present on verbs).