Index ordering by query-independent measures

Authors:
Paul Ferguson;Alan F. Smeaton
Affiliations:
CLARITY: Centre for Sensor Web Technologies, Dublin City University, Dublin, Ireland;CLARITY: Centre for Sensor Web Technologies, Dublin City University, Dublin, Ireland
Venue:
Information Processing and Management: an International Journal
Year:
2012

Citing 38
Cited 0

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Document filtering for fast ranking

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Optimization of inverted vector searches

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Spatial querying for image retrieval: a user-oriented evaluation

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A cognitive model of document use during a research project. Study II. Decisions at the reading and citing stages

Journal of the American Society for Information Science
Making large-scale support vector machine learning practical

Advances in kernel methods
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Does “authority” mean quality? predicting expert quality ratings of Web documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Static index pruning for information retrieval systems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The nearest neighbour problem in information retrieval: an algorithm using upperbounds

SIGIR '81 Proceedings of the 4th annual international ACM SIGIR conference on Information storage and retrieval: theoretical issues in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Mining the Web's Link Structure

Computer
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
Efficient query evaluation using a two-level retrieval process

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Access-ordered indexes

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Dempster-Shafer Theory for a Query-Biased Combination of Evidence on the Web

Information Retrieval
Exploiting the hierarchical structure for link analysis

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Simplified similarity scoring using term ranks

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Top subset retrieval on large collections using sorted indices

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

ACM Transactions on Information Systems (TOIS)
Detecting spam web pages through content analysis

Proceedings of the 15th international conference on World Wide Web
A document-centric approach to static index pruning in text retrieval systems

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Pruning policies for two-tiered inverted index with correctness guarantee

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Performance of compressed inverted list caching in search engines

Proceedings of the 17th international conference on World Wide Web
ResIn: a combination of results caching and index pruning for high-performance web search engines
Linguistic Analysis of Users' Queries: Towards an Adaptive Information Retrieval System

SITIS '07 Proceedings of the 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System
A survey of pre-retrieval query performance predictors

Proceedings of the 17th ACM conference on Information and knowledge management
Term Impacts as Normalized Term Frequencies for BM25 Similarity Scoring

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Físréal: a low cost terabyte search engine

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming. A solution to this problem is to only search a limited amount of the collection at query-time, in order to speed up the retrieval process. In doing this we can also limit the loss in retrieval efficacy (in terms of accuracy of results). The way we achieve this is to firstly identify the most ''important'' documents within the collection, and sort documents within inverted file lists in order of this ''importance''. In this way we limit the amount of information to be searched at query time by eliminating documents of lesser importance, which not only makes the search more efficient, but also limits loss in retrieval accuracy. Our experiments, carried out on the TREC Terabyte collection, report significant savings, in terms of number of postings examined, without significant loss of effectiveness when based on several measures of importance used in isolation, and in combination. Our results point to several ways in which the computation cost of searching large collections of documents can be significantly reduced.