Global topology of word co-occurrence networks: beyond the two-regime power-law

Authors:
Monojit Choudhury;Diptesh Chatterjee;Animesh Mukherjee
Affiliations:
Microsoft Research Lab India;Indian Institute of Technology Kharagpur;ISI Foundation
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Year:
2010

Citing 4
Cited 1

Using eigenvectors of the bigram graph to infer morpheme identity

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Graph spectra as a systematic tool in computational biology

Discrete Applied Mathematics
Discovering global patterns in linguistic networks through spectral analysis: a case study of the consonant inventories

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research

Editorial: Network based models of cognitive and social dynamics of human languages

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

Word co-occurrence networks are one of the most common linguistic networks studied in the past and they are known to exhibit several interesting topological characteristics. In this article, we investigate the global topological properties of word co-occurrence networks and, in particular, present a detailed study of their spectrum. Our experiments reveal certain universal trends found across the networks for seven different languages from three different language families, which are neither reported nor explained by any of the previous studies and models of word-cooccurrence networks. We hypothesize that since word co-occurrences are governed by syntactic properties of a language, the network has much constrained topology than that predicted by the previously proposed growth model. A deeper empirical and theoretical investigation into the evolution of these networks further suggests that they have a coreperiphery structure, where the core hardly evolves with time and new words are only attached to the periphery of the network. These properties are fundamental to the nature of word co-occurrence across languages.