Performance Analysis of a Distributed Question/Answering System

Authors:
Mihai Surdeanu;Dan I. Moldovan;Sanda M. Harabagiu
Affiliations:
Language Corporation, Dallas, TX;Univ. of Texas at Dallas, Richardson, TX;Univ. of Texas at Dallas, Richardson, TX
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2002

Citing 26
Cited 13

The Gradient Model Load Balancing Method

IEEE Transactions on Software Engineering - Special issue on distributed systems
Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme

IEEE Transactions on Software Engineering
Transparent process migration: design alternatives and the sprite implementation

Software—Practice & Experience
Distributed operating systems

Distributed operating systems
The effectiveness of GIOSS for the text database discovery problem

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Parallel load-balancing: an extension to the gradient model

Parallel Computing
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Managing server load in global memory systems

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Exploiting process lifetime distributions for dynamic load balancing

ACM Transactions on Computer Systems (TOCS)
Adaptive partitioning and scheduling for enhancing WWW application performance

Journal of Parallel and Distributed Computing - Parallel and distributed data structures
Availability and utility of idle memory in workstation clusters

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Automatic discovery of language models for text databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Adaptive performance prediction for distributed data-intensive applications

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

ACM Transactions on Information Systems (TOIS)
Evaluating the Scalability of Distributed Systems

IEEE Transactions on Parallel and Distributed Systems
How Network Topology Affects Dynamic Load Balancing

IEEE Parallel & Distributed Technology: Systems & Technology
Improved Strategies for Dynamic Load Balancing

IEEE Concurrency
Adaptive Parallelism and Piranha

Computer
Strategies for Dynamic Load Balancing on Highly Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Effective Load Sharing on Heterogeneous Networks of Workstations

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Improving Distributed Workload Performance by Sharing Both CPU and Memory Resources

ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Dynamic Load Sharing With Unknown Memory Demands in Clusters

ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Experiments with open-domain textual Question Answering

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
The structure and performance of an open-domain question answering system

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics

A new fuzzy-decision based load balancing system for distributed object computing

Journal of Parallel and Distributed Computing
Distributed Path-Based Inference in Semantic Networks

The Journal of Supercomputing
Web question answering through automatically learned patterns

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Learning patterns to answer open domain questions on the web

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Self-learning web question answering system

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Toward cooperative genomic knowledge inference

Parallel Computing - Special issue: High-performance parallel bio-computing
Design and analysis of a load balancing strategy in data grids

Future Generation Computer Systems - Special section: Data mining in grid computing environments
Applying question answering technology to locating malevolent online content

Decision Support Systems
Beyond keywords: Automated question answering on the web

Communications of the ACM - Enterprise information integration: and other tools for merging data
Cache-aware load balancing for question answering

Proceedings of the 17th ACM conference on Information and knowledge management
Dynamic load balancing for I/O-intensive applications on clusters

ACM Transactions on Storage (TOS)
Packaging and generating mechanism of image processing services on heterogeneous grid platforms

GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
A multi-layer collaborative cache for question answering

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of question/answering (Q/A) is to find answers to open-domain questions by searching large collections of documents. Unlike information retrieval systems very common today in the form of Internet search engines, Q/A systems do not retrieve documents, but instead provide short, relevant answers located in small fragments of text. This enhanced functionality comes with a price: Q/A systems are significantly slower and require more hardware resources than information retrieval systems. This paper proposes a distributed Q/A architecture that enhances the system throughput through the exploitation of interquestion parallelism and dynamic load balancing and reduces the individual question response time through the exploitation of intraquestion parallelism. Inter and intraquestion parallelism are both exploited using several scheduling points: one before the Q/A task is started and two embedded in the Q/A task. An analytical performance model is introduced. The model analyzes both the interquestion parallelism overhead generated by the migration of questions and the intraquestion parallelism overhead generated by the partitioning of the Q/A task. The analytical model indicates that both question migration and partitioning are required for a high-performance system: Intraquestion parallelism leads to significant speedup of individual questions, but it is practical up to about 90 processors, depending on the system parameters. The exploitation of intertask parallelism provides a scalable way to improve the system throughput. The distributed Q/A system has been implemented on a network of 16 Pentium III computers. The experimental results indicate that, at high system load, the dynamic load balancing strategy proposed in this paper outperforms two other traditional approaches. At low system load, the distributed Q/A system reduces question response times through task partitioning, with factors close to the ones indicated by the analytical model.