Introduction to algorithms
Incremental updates of inverted lists for text document retrieval
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Interaction of query evaluation and buffer management for information retrieval
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
Building a distributed full-text index for the Web
Proceedings of the 10th international conference on World Wide Web
ACM Transactions on Internet Technology (TOIT)
Modern Information Retrieval
Mining the Web: Discovering Knowledge from HyperText Data
Mining the Web: Discovering Knowledge from HyperText Data
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Incremental Indexing for Full-Text Information Retrieval
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Predictive caching and prefetching of query results in search engines
WWW '03 Proceedings of the 12th international conference on World Wide Web
Dynamic maintenance of web indexes using landmarks
WWW '03 Proceedings of the 12th international conference on World Wide Web
Efficient single-pass index construction for text databases
Journal of the American Society for Information Science and Technology
Efficient query evaluation using a two-level retrieval process
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A statistics-based approach to incrementally update inverted files
Information Processing and Management: an International Journal
Efficient online index maintenance for contiguous inverted lists
Information Processing and Management: an International Journal
Trustworthy keyword search for regulatory-compliant records retention
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
I/O-conscious data preparation for large-scale web search engines
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
High performance index build algorithms for intranet search engines
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient Index Maintenance for Frequently Updated Semantic Data
ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
Low-cost management of inverted files for online full-text search
Proceedings of the 18th ACM conference on Information and knowledge management
Hi-index | 0.00 |
E-commerce and intranet search systems require newly arriving content to be indexed and made available for search within minutes or hours of arrival. Applications such as file system and email search demand even faster turnaround from search systems, requiring new content to become available for search almost instantaneously. However, incrementally updating inverted indices, which are the predominant datastructure used in search engines, is an expensive operation that most systems avoid performing at high rates. We present JiTI, a Just-in-Time Indexing component that allows searching over incoming content (nearly) as soon as that content reaches the system. JiTI's main idea is to invest less in the preprocessing of arriving data, at the expense of a tolerable latency in query response time. It is designed for deployment in search systems that maintain a large main index and that rebuild smaller stop-press indices once or twice an hour. JiTI augments such systems with instant retrieval capabilities over content arriving in between the stop-press builds. A main design point is for JiTI to demand few computational resources, in particular RAM and I/O. Our experiments consisted of injecting several documents and queries per second concurrently into the system over half-hour long periods. We believe that there are search applications for which the combination of the workloads we experimented with and the response times we measured present a viable solution to a pressing problem.