Effect of word density on measuring words association

Authors:
Sanasam Ranbir Singh;Hema A. Murthy;Timothy A. Gonsalves
Affiliations:
Indian Institute of Technology Madras, Chennai, India;Indian Institute of Technology Madras, Chennai, India;Indian Institute of Technology Madras, Chennai, India
Venue:
COMPUTE '08 Proceedings of the 1st Bangalore Annual Compute Conference
Year:
2008

Citing 17
Cited 0

Word association norms, mutual information, and lexicography

Computational Linguistics
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
On Relevance, Probabilistic Indexing and Information Retrieval

Journal of the ACM (JACM)
The Association Factor in Information Retrieval

Journal of the ACM (JACM)
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
An information-theoretic approach to automatic query expansion

ACM Transactions on Information Systems (TOIS)
Clustering user queries of a search engine

Proceedings of the 10th international conference on World Wide Web
Probabilistic query expansion using query logs

Proceedings of the 11th international conference on World Wide Web
Information Retrieval

Information Retrieval
Focused Crawling Using Context Graphs

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Automatic scientific text classification using local patterns: KDD CUP 2002 (task 1)

ACM SIGKDD Explorations Newsletter
Analysis of performance variation using query expansion

Journal of the American Society for Information Science and Technology
Matching words and pictures

The Journal of Machine Learning Research
Term extraction + term clustering: an integrated platform for computer-aided terminology

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Effective web crawling

ACM SIGIR Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

The study of mining the associated words is not new. Because of its wide ranges of applications, it is still an important issue in Information Retrieval. The existing estimators such as joint probability, words association norm do not consider the density of the words present in each window. In this paper, we incorporate the word density and propose estimator based on word density to measure the association between the words. From various experimental results based on the human judgments and precision collected from search engines, we find that the precision of the estimators could be improved by incorporating word density. For all ranges of the size of the windows, our estimator outperforms all other estimators. We also observe that all these estimators (both existing and proposed one) perform relatively better when the windows contain around five sentences. We also show by using Spearman rank-order correlation coefficient that our estimator returns better quality of the ranking of the associated terms.