Inverted File Partitioning Schemes in Multiple Disk Systems
IEEE Transactions on Parallel and Distributed Systems
Efficient declustering techniques for temporal access structures
ADC '01 Proceedings of the 12th Australasian database conference
Information Retrieval: Algorithms and Heuristics
Information Retrieval: Algorithms and Heuristics
Modern Information Retrieval
Query processing and inverted indices in shared: nothing text document information retrieval systems
The VLDB Journal — The International Journal on Very Large Data Bases - Parallelism in database systems
LoT: Dynamic Declustering of TSB-Tree Nodes for Parallel Access to Temporal Data
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Optimal Allocation of Two-Dimensional Data
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Multidimensional Declustering Schemes Using Golden Ratio and Kronecker Sequences
IEEE Transactions on Knowledge and Data Engineering
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Improved bounds and schemes for the declustering problem
Theoretical Computer Science
Information Sciences: an International Journal
Hi-index | 0.00 |
Multiple-disk architectures are an attractive approach to meet high performance I/O demands in I/O intensive applications such as search engines, web servers and information retrieval systems. This requires that the issues of dynamic load balancing and access parallelism be addressed, which is the goal of this paper. We address the problem of document declustering in a keyword-based information retrieval system for parallel architectures consisting of a single processor and multiple disks. We propose and evaluate experimentally four similarity-based methods, viz., set, multiset, vector, and euclidean, for declustering documents. Interestingly, our results show that for single keyword queries as well as boolean and queries the set and multiset methods generally outperform the vector and euclidean methods with set being the best for the so-called simple plan. We also introduce a highest-frequency first retrieval scenario and compare the methods under this scenario, and find that set and multiset methods are still generally superior to the other methods with the multiset outperforming the set method. We compare these methods with the (theoretically) optimal values, which are practically impossible to achieve. Finally, we approximated the multiset method using the harmonic mean and found that the results were slightly inferior than multiset method, but still better than the vector and euclidean methods.