Just in time indexing for up to the second search

Authors:
Ronny Lempel;Yosi Mass;Shila Ofek-Koifman;Dafna Sheinwald;Yael Petruschka;Ron Sivan
Affiliations:
IBM Research, Haifa, Israel;IBM Research, Haifa, Israel;IBM Research, Haifa, Israel;IBM Research, Haifa, Israel;IBM Research, Haifa, Israel;IBM Research, Haifa, Israel
Venue:
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Year:
2007

Citing 21
Cited 2

Introduction to algorithms

Introduction to algorithms
Incremental updates of inverted lists for text document retrieval

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Interaction of query evaluation and buffer management for information retrieval

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
Building a distributed full-text index for the Web

Proceedings of the 10th international conference on World Wide Web
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Modern Information Retrieval

Modern Information Retrieval
Mining the Web: Discovering Knowledge from HyperText Data

Mining the Web: Discovering Knowledge from HyperText Data
The Evolution of the Web and Implications for an Incremental Crawler

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Incremental Indexing for Full-Text Information Retrieval

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
Dynamic maintenance of web indexes using landmarks

WWW '03 Proceedings of the 12th international conference on World Wide Web
Efficient single-pass index construction for text databases

Journal of the American Society for Information Science and Technology
Efficient query evaluation using a two-level retrieval process

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A statistics-based approach to incrementally update inverted files

Information Processing and Management: an International Journal
Efficient online index maintenance for contiguous inverted lists

Information Processing and Management: an International Journal
Trustworthy keyword search for regulatory-compliant records retention

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
I/O-conscious data preparation for large-scale web search engines

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
High performance index build algorithms for intranet search engines

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Efficient Index Maintenance for Frequently Updated Semantic Data

ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
Low-cost management of inverted files for online full-text search

Proceedings of the 18th ACM conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

E-commerce and intranet search systems require newly arriving content to be indexed and made available for search within minutes or hours of arrival. Applications such as file system and email search demand even faster turnaround from search systems, requiring new content to become available for search almost instantaneously. However, incrementally updating inverted indices, which are the predominant datastructure used in search engines, is an expensive operation that most systems avoid performing at high rates. We present JiTI, a Just-in-Time Indexing component that allows searching over incoming content (nearly) as soon as that content reaches the system. JiTI's main idea is to invest less in the preprocessing of arriving data, at the expense of a tolerable latency in query response time. It is designed for deployment in search systems that maintain a large main index and that rebuild smaller stop-press indices once or twice an hour. JiTI augments such systems with instant retrieval capabilities over content arriving in between the stop-press builds. A main design point is for JiTI to demand few computational resources, in particular RAM and I/O. Our experiments consisted of injecting several documents and queries per second concurrently into the system over half-hour long periods. We believe that there are search applications for which the combination of the workloads we experimented with and the response times we measured present a viable solution to a pressing problem.