An algorithm for drawing general undirected graphs
Information Processing Letters
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
A formal study of information retrieval heuristics
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
PageRank on semantic networks, with application to word sense disambiguation
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Random walk term weighting for information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of statistical significance tests for information retrieval evaluation
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Introduction to Information Retrieval
Introduction to Information Retrieval
Summarization system evaluation revisited: N-gram graphs
ACM Transactions on Speech and Language Processing (TSLP)
A Text Network Representation Model
FSKD '08 Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 04
Part of Speech Based Term Weighting for Information Retrieval
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
LexRank: graph-based lexical centrality as salience in text summarization
Journal of Artificial Intelligence Research
Lower-bounding term frequency normalization
Proceedings of the 20th ACM international conference on Information and knowledge management
Graph-based term weighting for information retrieval
Information Retrieval
Efficient query recommendations in the long tail via center-piece subgraphs
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Composition of TF normalizations: new insights on scoring functions for ad hoc IR
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
In this paper, we introduce novel document representation (graph-of-word) and retrieval model (TW-IDF) for ad hoc IR. Questioning the term independence assumption behind the traditional bag-of-word model, we propose a different representation of a document that captures the relationships between the terms using an unweighted directed graph of terms. From this graph, we extract at indexing time meaningful term weights (TW) that replace traditional term frequencies (TF) and from which we define a novel scoring function, namely TW-IDF, by analogy with TF-IDF. This approach leads to a retrieval model that consistently and significantly outperforms BM25 and in some cases its extension BM25+ on various standard TREC datasets. In particular, experiments show that counting the number of different contexts in which a term occurs inside a document is more effective and relevant to search than considering an overall concave term frequency in the context of ad hoc IR.