Low-cost management of inverted files for online full-text search

Authors:
Giorgos Margaritis;Stergios V. Anastasiadis
Affiliations:
University of Ioannina, Ioannina, Greece;University of Ioannina, Ioannina, Greece
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 22
Cited 4

Optimization for dynamic inverted index maintenance

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Incremental updates of inverted lists for text document retrieval

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Efficient distributed algorithms to build inverted files

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Fast Incremental Indexing for Full-Text Information Retrieval

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
Efficient single-pass index construction for text databases

Journal of the American Society for Information Science and Technology
In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Connections: using context to enhance file search

Proceedings of the twentieth ACM symposium on Operating systems principles
Fast on-line index construction by geometric partitioning

Proceedings of the 14th ACM international conference on Information and knowledge management
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Implementing sorting in database systems

ACM Computing Surveys (CSUR)
Efficient online index maintenance for contiguous inverted lists

Information Processing and Management: an International Journal
Hybrid index maintenance for growing text collections

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Just in time indexing for up to the second search

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Efficient on-line index maintenance for dynamic text collections by using dynamic balancing tree

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Hybrid index maintenance for contiguous inverted lists

Information Retrieval
AWOL: an adaptive write optimizations layer

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Efficient online index construction for text databases

ACM Transactions on Database Systems (TODS)
A hybrid approach to index maintenance in dynamic text retrieval systems

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Search in social networks with access control

Proceedings of the 2nd International Workshop on Keyword Search on Structured Data
Workload-aware indexing for keyword search in social networks

Proceedings of the 20th ACM international conference on Information and knowledge management
Fast and incremental indexing in effective and efficient XML element retrieval systems

Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Dynamic memory allocation policies for postings in real-time Twitter search

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In dynamic environments with frequent content updates, we require online full-text search that scales to large data collections and achieves low search latency. Several recent methods that support fast incremental indexing of documents typically keep on disk multiple partial index structures that they continuously update as new documents are added. However, spreading indexing information across multiple locations on disk tends to considerably decrease the search responsiveness of the system. In the present paper, we take a fresh look at the problem of online full-text search with consideration of the architectural features of modern systems. Selective Range Flush is a greedy method that we introduce to manage the index in the system by using fixed-size blocks to organize the data on disk and dynamically keep low the cost of data transfer between memory and disk. As we experimentally demonstrate with the Proteus prototype implementation that we developed, we retrieve indexing information at latency that matches the lowest achieved by existing methods. Additionally, we reduce the total building cost by 30% in comparison to methods with similar retrieval time.