Prototyping a distributed information retrieval system that uses statistical ranking
Information Processing and Management: an International Journal
Parallelizing I/O intensive applications for a workstation cluster: a case study
ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Journal of the American Society for Information Science
Inverted File Partitioning Schemes in Multiple Disk Systems
IEEE Transactions on Parallel and Distributed Systems
Performance evaluation of a distributed architecture for information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A design of a distributed full text retrieval system
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Query performance for tightly coupled distributed digital libraries
Proceedings of the third ACM conference on Digital libraries
Methods for information server selection
ACM Transactions on Information Systems (TOIS)
Retrieval performance of a distributed text database utilizing a parallel processor document server
DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
Partial collection replication versus caching for information retrieval systems
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Spatial information retrieval and geographical ontologies an overview of the SPIRIT project
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Scalable Text Retrieval for Large Digital Libraries
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Engineering a multi-purpose test collection for web retrieval experiments
Information Processing and Management: an International Journal
Integrating Proximity to Subjective Sentences for Blog Opinion Retrieval
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Enabling portability in advanced information-centric services over structured peer-to-peer systems
Journal of Network and Computer Applications
ACM SIGIR Forum
Load and storage balanced posting file partitioning for parallel information retrieval
Journal of Systems and Software
A cascade ranking model for efficient ranked retrieval
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Network analysis for distributed information retrieval architectures
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
MapReduce indexing strategies: Studying scalability and efficiency
Information Processing and Management: an International Journal
Information Retrieval on the Blogosphere
Foundations and Trends in Information Retrieval
Capacity planning for vertical search engines: an approach based on coloured petri nets
PETRI NETS'12 Proceedings of the 33rd international conference on Application and Theory of Petri Nets
When big data leads to lost data
Proceedings of the 5th Ph.D. workshop on Information and knowledge
Hi-index | 0.00 |
The increasing number of documents to be indexed in many environments (Web, intranets, digital libraries) and the limitations of a single centralised index (lack of scalability, server overloading and failures), lead to the use of distributed information retrieval systems to efficiently search and locate the desired information. This work is a case study of different architectures for a distributed information retrieval system, in order to provide a guide to approximate the optimal architecture with a specific set of resources. We analyse the effectiveness of a distributed, replicated and clustered architecture simulating a variable number of workstations (from 1 up to 4096). A collection of approximately 94 million documents and 1 terabyte (TB) of text is used to test the performance of the different architectures. In a purely distributed information retrieval system, the brokers become the bottleneck due to the high number of local answer sets to be sorted. In a replicated system, the network is the bottleneck due to the high number of query servers and the continuous data interchange with the brokers. Finally, we demonstrate that a clustered system will outperform a replicated system if a high number of query servers is used, essentially due to the reduction of the network load. However a change in the distribution of the users' queries could reduce the performance of a clustered system.