A generic parallel processing model for facilitating data mining and integration

  • Authors:
  • Liangxiu Han;Chee Sun Liew;Jano van Hemert;Malcolm Atkinson

  • Affiliations:
  • School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, UK;School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, UK and Faculty of Computer Science and Information Technology, University of Malaya, 50603 Kuala Lumpur, Mala ...;School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, UK;School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, UK

  • Venue:
  • Parallel Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

To facilitate data mining and integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements (PEs). The composition mechanism links PEs via data streams, which may be in memory, buffered via disks or inter-computer data-flows. This makes it possible to build arbitrary DAGs with pipelining and both data and task parallelisms, which provide room for performance enhancement. We have applied this approach to a real DMI case in the life sciences and implemented a prototype. To demonstrate feasibility of the modelled DMI task and assess the efficiency of the prototype, we have also built a performance evaluation model. The experimental evaluation results show that a linear speedup has been achieved with the increase of the number of distributed computing nodes in this case study.