Performance analysis of several back-end database architectures
ACM Transactions on Database Systems (TODS)
Parallel free-text search on the connection machine system
Communications of the ACM - Special issue on parallelism
R* optimizer validation and performance evaluation for local queries
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Strategies for building distributed information retrieval systems
Information Processing and Management: an International Journal
SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
Performance modeling of distributed object-oriented database systems
DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
A parallel indexed algorithm for information retrieval
SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
A case study of caching strategies for a distributed full text retrieval system
Information Processing and Management: an International Journal
Parallel text searching in serial files using a processor farm
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Data caching strategies for distributed full text retrieval systems
Information Systems
Prototyping a distributed information retrieval system that uses statistical ranking
Information Processing and Management: an International Journal
On the allocation of documents in multiprocessor information retrieval systems
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Parallel database systems: the future of high performance database systems
Communications of the ACM
Information Processing and Management: an International Journal - Special issue on Informetrics
Parallelizing I/O intensive applications for a workstation cluster: a case study
ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Journal of the American Society for Information Science
Distributed queries and incremental updates in information retrieval systems
Distributed queries and incremental updates in information retrieval systems
Inverted File Partitioning Schemes in Multiple Disk Systems
IEEE Transactions on Parallel and Distributed Systems
TREC and TIPSTER experiments with INQUERY
TREC-2 Proceedings of the second conference on Text retrieval conference
Dissemination of collection wide information in a distributed information retrieval system
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning collection fusion strategies
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Performance evaluation of a distributed architecture for information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval performance of a distributed text database utilizing a parallel processor document server
DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Distributed Database Systems
GAMMA - A High Performance Dataflow Database Machine
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Scalable Text Retrieval for Large Digital Libraries
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
The Hardware/Software Balancing Act for Information Retrieval on Symmetric Multiprocessors
Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Partial collection replication versus caching for information retrieval systems
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Performance Analysis of a Distributed Question/Answering System
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of a Distributed Question/Answering System
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Optimizing result prefetching in web search engines with segmented indices
ACM Transactions on Internet Technology (TOIT)
A cost-oriented approach for infrastructural design
Proceedings of the 2004 ACM symposium on Applied computing
A content model for evaluating peer-to-peer searching techniques
Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
Guiding queries to information sources with InfoBeacons
Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Query-driven document partitioning and collection selection
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Load balancing for term-distributed parallel retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient in-memory extensible inverted file
Information Systems
A pipelined architecture for distributed text query evaluation
Information Retrieval
Pruning policies for two-tiered inverted index with correctness guarantee
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing result prefetching in web search engines with segmented indices
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Optimized query execution in large search engines with global page ordering
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Cost minimization in the design of IT infrastructures
SEPADS'06 Proceedings of the 5th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems
Mining query logs to optimize index partitioning in parallel web search engines
Proceedings of the 2nd international conference on Scalable information systems
An optimal overlay topology for routing peer-to-peer searches
Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
Using information retrieval techniques to route queries in an infobeacons network
DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
A multi-model algorithm for the cost-oriented design of Internet-based systems
Information Sciences: an International Journal
An optimal overlay topology for routing peer-to-peer searches
Middleware'05 Proceedings of the ACM/IFIP/USENIX 6th international conference on Middleware
Capacity planning for vertical search engines: an approach based on coloured petri nets
PETRI NETS'12 Proceedings of the 33rd international conference on Application and Theory of Petri Nets
Shard ranking and cutoff estimation for topically partitioned collections
Proceedings of the 21st ACM international conference on Information and knowledge management
PMAX: tenant placement in multitenant databases for profit maximization
Proceedings of the 16th International Conference on Extending Database Technology
Modelling Search Engines Performance Using Coloured Petri Nets
Fundamenta Informaticae - Application and Theory of Petri Nets and Concurrency, 2012
Hi-index | 0.00 |
The information explosion across the Internet and elswhere offers access to an increasing number of document collections. In order for users to effectively access these collections, information retrieval (IR) systems must provide coordinated, concurrent, and distributed access. In this article, we explore how to achieve scalable performance in a distributed system for collection sizes ranging from 1GB to 128GB. We implement a fully functional distributed IR system based on a multithreaded version of the Inquery simulation model. We measure performance as a function of system parameters such as client command rate, number of document collections, ter ms per query, query term frequency, number of answers returned, and command mixture. Our results show that it is important to model both query and document commands because the heterogeneity of commands significantly impacts performance. Based on our results, we recommend simple changes to the prototype and evaluate the changes using the simulator. Because of the significant resource demands of information retrieval, it is not difficult to generate workloads that overwhelm system resources regardless of the architecture. However under some realistic workloads, we demonstrate system organizations for which response time gracefully degrades as the workload increases and performance scales with the number of processors. This scalable architecture includes a surprisingly small number of brokers through which a large number of clients and servers communicate.