Just in time indexing for up to the second search

  • Authors:
  • Ronny Lempel;Yosi Mass;Shila Ofek-Koifman;Dafna Sheinwald;Yael Petruschka;Ron Sivan

  • Affiliations:
  • IBM Research, Haifa, Israel;IBM Research, Haifa, Israel;IBM Research, Haifa, Israel;IBM Research, Haifa, Israel;IBM Research, Haifa, Israel;IBM Research, Haifa, Israel

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

E-commerce and intranet search systems require newly arriving content to be indexed and made available for search within minutes or hours of arrival. Applications such as file system and email search demand even faster turnaround from search systems, requiring new content to become available for search almost instantaneously. However, incrementally updating inverted indices, which are the predominant datastructure used in search engines, is an expensive operation that most systems avoid performing at high rates. We present JiTI, a Just-in-Time Indexing component that allows searching over incoming content (nearly) as soon as that content reaches the system. JiTI's main idea is to invest less in the preprocessing of arriving data, at the expense of a tolerable latency in query response time. It is designed for deployment in search systems that maintain a large main index and that rebuild smaller stop-press indices once or twice an hour. JiTI augments such systems with instant retrieval capabilities over content arriving in between the stop-press builds. A main design point is for JiTI to demand few computational resources, in particular RAM and I/O. Our experiments consisted of injecting several documents and queries per second concurrently into the system over half-hour long periods. We believe that there are search applications for which the combination of the workloads we experimented with and the response times we measured present a viable solution to a pressing problem.