A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Document filtering for fast ranking
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory
The nature of statistical learning theory
Filtered document retrieval with frequency-sorted indexes
Journal of the American Society for Information Science
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Optimization of inverted vector searches
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Spatial querying for image retrieval: a user-oriented evaluation
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Journal of the American Society for Information Science
Making large-scale support vector machine learning practical
Advances in kernel methods
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Does “authority” mean quality? predicting expert quality ratings of Web documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The nearest neighbour problem in information retrieval: an algorithm using upperbounds
SIGIR '81 Proceedings of the 4th annual international ACM SIGIR conference on Information storage and retrieval: theoretical issues in information retrieval
Modern Information Retrieval
Mining the Web's Link Structure
Computer
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Predictive caching and prefetching of query results in search engines
WWW '03 Proceedings of the 12th international conference on World Wide Web
Efficient query evaluation using a two-level retrieval process
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Dempster-Shafer Theory for a Query-Biased Combination of Evidence on the Web
Information Retrieval
Exploiting the hierarchical structure for link analysis
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Simplified similarity scoring using term ranks
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Top subset retrieval on large collections using sorted indices
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
ACM Transactions on Information Systems (TOIS)
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
A document-centric approach to static index pruning in text retrieval systems
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Pruning policies for two-tiered inverted index with correctness guarantee
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Optimized query execution in large search engines with global page ordering
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Performance of compressed inverted list caching in search engines
Proceedings of the 17th international conference on World Wide Web
Linguistic Analysis of Users' Queries: Towards an Adaptive Information Retrieval System
SITIS '07 Proceedings of the 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System
A survey of pre-retrieval query performance predictors
Proceedings of the 17th ACM conference on Information and knowledge management
Term Impacts as Normalized Term Frequencies for BM25 Similarity Scoring
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Físréal: a low cost terabyte search engine
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Hi-index | 0.00 |
Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming. A solution to this problem is to only search a limited amount of the collection at query-time, in order to speed up the retrieval process. In doing this we can also limit the loss in retrieval efficacy (in terms of accuracy of results). The way we achieve this is to firstly identify the most ''important'' documents within the collection, and sort documents within inverted file lists in order of this ''importance''. In this way we limit the amount of information to be searched at query time by eliminating documents of lesser importance, which not only makes the search more efficient, but also limits loss in retrieval accuracy. Our experiments, carried out on the TREC Terabyte collection, report significant savings, in terms of number of postings examined, without significant loss of effectiveness when based on several measures of importance used in isolation, and in combination. Our results point to several ways in which the computation cost of searching large collections of documents can be significantly reduced.