Mean value technique for closed fork-join networks
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
GPFS: A Shared-Disk File System for Large Computing Clusters
FAST '02 Proceedings of the Conference on File and Storage Technologies
Workload Service Requirements Analysis: A Queueing Network Optimization Approach
MASCOTS '02 Proceedings of the 10th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Building Nutch: Open Source Search
Queue - Search Engines
How to build a WebFountain: An architecture for very large-scale text analytics
IBM Systems Journal
Lucene in Action (In Action series)
Lucene in Action (In Action series)
A search engine for natural language applications
WWW '05 Proceedings of the 14th international conference on World Wide Web
IBM Journal of Research and Development - IBM BladeCenter systems
BladeCenter midplane and media interface card
IBM Journal of Research and Development - IBM BladeCenter systems
BladeCenter processor blades, I/O expansion adapters, and units
IBM Journal of Research and Development - IBM BladeCenter systems
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
Characterization of simultaneous multithreading (SMT) efficiency in POWER5
IBM Journal of Research and Development - POWER5 and packaging
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Towards the Design of a Scalable Email Archiving and Discovery Solution
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
Metadata domain-knowledge driven search engine in "HyperManyMedia" E-learning resources
CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Metadata as seeds for building an ontology driven information retrieval system
International Journal of Hybrid Intelligent Systems
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
An innovative approach to the development of E-government search services
EGOVIS'11 Proceedings of the Second international conference on Electronic government and the information systems perspective
Accelerating text mining workloads in a MapReduce-based distributed GPU environment
Journal of Parallel and Distributed Computing
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
Nutch is an open source search engine that is gaining increasing popularity in the commercial world. The Nutch architecture leads itself to a wide range of parallelization techniques. Multiple backend servers can be used to both partition the corpus of search data, thus increasing the rate of queries serviced, and to increase the size of the search data while preserving the service rate. Alternatively, multiple search engines can operate in parallel, further increasing the query rate. In this paper, we analyze the performance and scalability of various configurations of Nutch. The configurations were implemented as part of the Commercial Scale Out project at IBM Research, and were used to investigate the applicability of scale-out architectures in commercial environments. We conclude that Nutch is highly scalable, with the different configurations behaving differently from a performance perspective.