Re-ranking Documents Based on Query-Independent Document Specificity

Authors:
Lei Zheng;Ingemar J. Cox
Affiliations:
Department of Computer Science, University College London, London, United Kingdom WC1E 6BT;Department of Computer Science, University College London, London, United Kingdom WC1E 6BT
Venue:
FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Year:
2009

Citing 16
Cited 0

Viewing morphology as an inference process

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
A probabilistic model of information retrieval: development and comparative experiments

Information Processing and Management: an International Journal
Finding authorities and hubs from link structures on the World Wide Web

Proceedings of the 10th international conference on World Wide Web
A stop list for general text

ACM SIGIR Forum
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
PageRank without hyperlinks: structural re-ranking using links induced by language models

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Improving web search results using affinity graph

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Re-ranking method based on inter-document distances

Information Processing and Management: an International Journal
Regularizing ad hoc retrieval scores

Proceedings of the 14th ACM international conference on Information and knowledge management
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Google's PageRank and Beyond: The Science of Search Engine Rankings

Google's PageRank and Beyond: The Science of Search Engine Rankings
Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of query-independent knowledge to improve the ranking of documents in information retrieval has proven very effective in the context of web search. This query-independent knowledge is derived from an analysis of the graph structure of hypertext links between documents. However, there are many cases where explicit hypertext links are absent or sparse, e.g. corporate Intranets. Previous work has sought to induce a graph link structure based on various measures of similarity between documents. After inducing these links, standard link analysis algorithms, e.g. PageRank, can then be applied. In this paper, we propose and examine an alternative approach to derive query-independent knowledge, which is not based on link analysis. Instead, we analyze each document independently and calculate a "specificity" score, based on (i) normalized inverse document frequency, and (ii) term entropies. Two re-ranking strategies, i.e. hard cutoff and soft cutoff, are then discussed to utilize our query-independent "specificity" scores. Experiments on standard TREC test sets show that our re-ranking algorithms produce gains in mean reciprocal rank of about 4%, and 4% to 6% gains in precision at 5 and 10, respectively, when using the collection of TREC disk 4 and queries from TREC 8 ad hoc topics. Empirical tests demonstrate that the entropy-based algorithm produces stable results across (i) retrieval models, (ii) query sets, and (iii) collections.