Parallel free-text search on the connection machine system
Communications of the ACM - Special issue on parallelism
Description and performance analysis of signature file methods for office filing
ACM Transactions on Information Systems (TOIS)
Optimization for dynamic inverted index maintenance
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Partitioned posting files: a parallel inverted file structure for information retrieval
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Journal of the American Society for Information Science
Incremental updates of inverted lists for text document retrieval
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Inverted File Partitioning Schemes in Multiple Disk Systems
IEEE Transactions on Parallel and Distributed Systems
Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted files versus signature files for text indexing
ACM Transactions on Database Systems (TODS)
Efficient distributed algorithms to build inverted files
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
ACM Transactions on Information Systems (TOIS)
WebBase: a repository of Web pages
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Building a distributed full-text index for the Web
Proceedings of the 10th international conference on World Wide Web
ACM Transactions on Internet Technology (TOIT)
Building a distributed full-text index for the web
ACM Transactions on Information Systems (TOIS)
Burst tries: a fast, efficient data structure for string keys
ACM Transactions on Information Systems (TOIS)
The state of the art in locally distributed Web-server systems
ACM Computing Surveys (CSUR)
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Modern Information Retrieval
Adding Compression to Block Addressing Inverted Indexes
Information Retrieval
Lessons from Giant-Scale Services
IEEE Internet Computing
Information Retrieval
Parallel Information Retrieval on an SCI-Based PC-NOW
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Fast Incremental Indexing for Full-Text Information Retrieval
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The Hardware/Software Balancing Act for Information Retrieval on Symmetric Multiprocessors
Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Harvest, Yield, and Scalable Tolerant Systems
HOTOS '99 Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems
Parallel Generation of Inverted Files for Distributed Text Collections
SCCC '98 Proceedings of the XVIII International Conference of the Chilean Computer Science Society
Parallel Search using Partitioned Inverted Files
SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Efficient single-pass index construction for text databases
Journal of the American Society for Information Science and Technology
Enterprise Text Processing: A Sparse Matrix Approach
ITCC '01 Proceedings of the International Conference on Information Technology: Coding and Computing
A statistics-based approach to incrementally update inverted files
Information Processing and Management: an International Journal
Efficient online index maintenance for contiguous inverted lists
Information Processing and Management: an International Journal
GLIMPSE: a tool to search through entire file systems
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Run-length encodings (Corresp.)
IEEE Transactions on Information Theory
Engineering basic algorithms of an in-memory text search engine
ACM Transactions on Information Systems (TOIS)
Scalable, statistical storage allocation for extensible inverted file construction
Journal of Systems and Software
Cache-Oblivious dictionaries and multimaps with negligible failure probability
MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms
High volumes of event stream indexing and efficient multi-keyword searching for cloud monitoring
Future Generation Computer Systems
Hi-index | 0.00 |
The growing amount of on-line data demands efficient parallel and distributed indexing mechanisms to manage large resource requirements and unpredictable system failures. Parallel and distributed indices built using commodity hardware like personal computers (PCs) can substantially save cost because PCs are produced in bulk, achieving the scale of economy. However, PCs have limited amount of random access memory (RAM) and the effective utilization of RAM for in-memory inversion is crucial. This paper presents an analytical investigation and an empirical evaluation of storage-efficient inmemory extensible inverted files, which are represented by fixed- or variable-sized linked list nodes. The size of these linked list nodes is determined by minimizing the storage wastes or maximizing storage utilization under different conditions, which lead to different storage allocation schemes. Minimizing storage wastes also reduces the number of address indirections (i.e., chaining). We evaluated our storage allocation schemes using a number of reference collections. We found that the arrival rate scheme is the best in terms of both storage utilization and the mean number of chainings per term. The final storage utilization can be over 90% in our evaluation if there is a sufficient number of documents indexed. The mean number of chainings is not large (less than 2.6 for all the reference collections). We have also showed that our best storage allocation scheme can be used for our extensible compressed inverted file. The final storage utilization of the extensible compressed inverted file can be over 90% in our evaluation provided that there is a sufficient number of documents indexed. The proposed storage allocation schemes can also be used by compressed extensible inverted files with word positions