Long distance bigram models applied to word clustering

Authors:
Nikoletta Bassiou;Constantine Kotropoulos
Affiliations:
Department of Informatics, Aristotle University of Thessaloniki, Box 451, Thessaloniki 541 24, Greece;Department of Informatics, Aristotle University of Thessaloniki, Box 451, Thessaloniki 541 24, Greece
Venue:
Pattern Recognition
Year:
2011

Citing 24
Cited 2

Class-based n-gram models of natural language

Computational Linguistics
Improving statistical language model performance with automatically generated word hierarchies

Computational Linguistics
Automatic thesaurus construction using Bayesian networks

Information Processing and Management: an International Journal - Special issue: history of information science
Statistical methods for speech recognition

Statistical methods for speech recognition
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Algorithms for bigram and trigram word clustering

Speech Communication
Speech recognition: theory and C++ implementation

Speech recognition: theory and C++ implementation
Data clustering: a review

ACM Computing Surveys (CSUR)
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Induction of semantic classes from natural language text

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
Clustering Validity Assessment: Finding the Optimal Partitioning of a Data Set

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Enhanced word clustering for hierarchical text classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering word senses from text

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Distant Bigram Language Modelling Using Maximum Entropy

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Word clustering and disambiguation based on co-occurrence data

Natural Language Engineering
A simple approach to building ensembles of Naive Bayesian classifiers for word sense disambiguation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Using syntactic dependency as local context to resolve word sense ambiguity

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Exploring asymmetric clustering for statistical language modeling

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Bayesian word sense induction

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Parallel Spectral Clustering in Distributed Systems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic word clustering for text categorization using global information

AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology

Half-context language models

Computational Linguistics
Hierarchical verb clustering using graph factorization

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Two novel word clustering techniques are proposed which employ long distance bigram language models. The first technique is built on a hierarchical clustering algorithm and minimizes the sum of Mahalanobis distances of all words after a cluster merger from the centroid of the class created by merging. The second technique resorts to probabilistic latent semantic analysis (PLSA). Next, interpolated long distance bigrams are considered in the context of the aforementioned clustering techniques. Experiments conducted on the English Gigaword corpus (second edition) demonstrate that: (1) the long distance bigrams, when employed in the two clustering techniques under study, yield word clusters of better quality than the baseline bigrams; (2) the interpolated long distance bigrams outperform the long distance bigrams in the same respect; (3) the long distance bigrams perform better than the bigrams, which incorporate trigger-pairs selected at various distances; and (4) the best word clustering is achieved by the PLSA that employs interpolated long distance bigrams. Both proposed techniques outperform spectral clustering based on k-means. To assess objectively the quality of the created clusters, relative cluster validity indices are estimated as well as the average cluster sense precision, the average cluster sense recall, and the F-measure are computed by exploiting ground truth extracted from the WordNet.