Load Balancing of Parallelized Information Filters

Authors:
N. C. Rowe;A. Zaky
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2002

Citing 9
Cited 0

Parallel free-text search on the connection machine system

Communications of the ACM - Special issue on parallelism
High-performance computer architecture

High-performance computer architecture
Optimal Dynamic Remapping of Data Parallel Computations

IEEE Transactions on Computers
Signature-based text retrieval methods: a survey

Data Engineering
Information filtering and information retrieval: two sides of the same coin?

Communications of the ACM - Special issue on information filtering
Load balancing data parallel programs on distributed memory computers

Parallel Computing
Using local optimality criteria for efficient information retrieval with redundant information filters

ACM Transactions on Information Systems (TOIS)
Query Optimization in Database Systems

ACM Computing Surveys (CSUR)
Numerical Methods

Numerical Methods

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate the data-parallel implementation of a set of information filters used to rule out uninteresting data from a database or data stream. We develop an analytic model for the costs and advantages of load rebalancing for the parallel filtering processes, as well as a quick heuristic for its desirability. Our model uses binomial models of the filter processes and fits key parameters to the results of extensive simulations. Experiments confirm our model. Rebalancing should pay off whenever processor communications costs are high. Further experiments showed it can also pay off even with low communications costs for 16-64 processes and 1-10 data items per processor; then, imbalances can increase processing time by up to 52 percent in representative cases, and rebalancing can increase it by 78 percent, so our quick predictive model can be valuable. Results also show that our proposed heuristic rebalancing criterion gives close to optimal balancing. We also extend our model to handle variations in filter processing time per data item.