Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware
Journal of Computer and System Sciences - Special issu on PODS 2001
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
WWW '05 Proceedings of the 14th international conference on World Wide Web
Pruned query evaluation using pre-computed impacts
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Pruning strategies for mixed-mode querying
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Efficient search in large textual collections with redundancy
Proceedings of the 16th international conference on World Wide Web
ACM Transactions on the Web (TWEB)
Efficient top-k aggregation of ranked inputs
ACM Transactions on Database Systems (TODS)
A time machine for text search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Seeking stable clusters in the blogosphere
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
BlogScope: a system for online analysis of high volume text streams
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Computational Geometry: Algorithms and Applications
Computational Geometry: Algorithms and Applications
On efficiently searching trajectories and archival data for historical similarities
Proceedings of the VLDB Endowment
Effective top-k computation with term-proximity support
Information Processing and Management: an International Journal
Compact full-text indexing of versioned document collections
Proceedings of the 18th ACM conference on Information and knowledge management
Efficient indexing of versioned document sequences
ECIR'07 Proceedings of the 29th European conference on IR research
Indexing shared content in information retrieval systems
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Improved index compression techniques for versioned document collections
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Best position algorithms for efficient top-k query processing
Information Systems
Faster temporal range queries over versioned text
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Optimizing positional index structures for versioned document collections
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the VLDB Endowment
Discovering influential data objects over time
SSTD'13 Proceedings of the 13th international conference on Advances in Spatial and Temporal Databases
Hi-index | 0.00 |
We propose and study a new ranking problem in versioned databases. Consider a database of versioned objects which have different valid instances along a history (e.g., documents in a web archive). Durable top-k search finds the set of objects that are consistently in the top-k results of a query (e.g., a keyword query) throughout a given time interval (e.g., from June 2008 to May 2009). Existing work on temporal top-k queries mainly focuses on finding the most representative top-k elements within a time interval. Such methods are not readily applicable to durable top-k queries. To address this need, we propose two techniques that compute the durable top-k result. The first is adapted from the classic top-k rank aggregation algorithm NRA. The second technique is based on a shared execution paradigm and is more efficient than the first approach. In addition, we propose a special indexing technique for archived data. The index, coupled with a space partitioning technique, improves performance even further. We use data from Wikipedia and the Internet Archive to demonstrate the efficiency and effectiveness of our solutions.