Random indexing using statistical weight functions

Authors:
James Gorman;James R. Curran
Affiliations:
University of Sydney, Australia;University of Sydney, Australia
Venue:
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Year:
2006

Citing 9
Cited 2

Latent semantic indexing: a probabilistic analysis

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Database-friendly random projections

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
A non-projective dependency parser

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A statistical approach to language translation

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Scaling context space

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic bilingual lexicon acquisition using random indexing of parallel corpora

Natural Language Engineering
Scaling distributional similarity to large corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

Methodological Review: Empirical distributional semantics: Methods and biomedical applications

Journal of Biomedical Informatics
A random indexing approach for web user clustering and web prefetching

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Random Indexing is a vector space technique that provides an efficient and scalable approximation to distributional similarity problems. We present experiments showing Random Indexing to be poor at handling large volumes of data and evaluate the use of weighting functions for improving the performance of Random Indexing. We find that Random Index is robust for small data sets, but performance degrades because of the influence high frequency attributes in large data sets. The use of appropriate weight functions improves this significantly.