Parallel free-text search on the connection machine system
Communications of the ACM - Special issue on parallelism
High-performance computer architecture
High-performance computer architecture
Optimal Dynamic Remapping of Data Parallel Computations
IEEE Transactions on Computers
Signature-based text retrieval methods: a survey
Data Engineering
Information filtering and information retrieval: two sides of the same coin?
Communications of the ACM - Special issue on information filtering
Load balancing data parallel programs on distributed memory computers
Parallel Computing
ACM Transactions on Information Systems (TOIS)
Query Optimization in Database Systems
ACM Computing Surveys (CSUR)
Numerical Methods
Hi-index | 0.00 |
We investigate the data-parallel implementation of a set of information filters used to rule out uninteresting data from a database or data stream. We develop an analytic model for the costs and advantages of load rebalancing for the parallel filtering processes, as well as a quick heuristic for its desirability. Our model uses binomial models of the filter processes and fits key parameters to the results of extensive simulations. Experiments confirm our model. Rebalancing should pay off whenever processor communications costs are high. Further experiments showed it can also pay off even with low communications costs for 16-64 processes and 1-10 data items per processor; then, imbalances can increase processing time by up to 52 percent in representative cases, and rebalancing can increase it by 78 percent, so our quick predictive model can be valuable. Results also show that our proposed heuristic rebalancing criterion gives close to optimal balancing. We also extend our model to handle variations in filter processing time per data item.