P-Biblio-MetReS, a parallel data mining tool for the reconstruction of molecular networks

  • Authors:
  • Ivan Teixidó;Anabel Usié;Josep Ll. Lérida;Francesc Solsona;Jorge Comas;Nestor Torres;Hiren Karathia;Rui Alves

  • Affiliations:
  • University of Lleida, Lleida, Spain;University of Lleida, Lleida, Spain;University of Lleida, Lleida, Spain;University of Lleida, Lleida, Spain;University of Lleida, Lleida, Spain;University of Lleida, Lleida, Spain;University of Lleida, Lleida, Spain;University of Lleida, Lleida, Spain

  • Venue:
  • Proceedings of the 20th European MPI Users' Group Meeting
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Biblio-MetReS is a single-thread data mining application that facilitates the reconstruction of molecular networks based on automated text mining analysis of published scientific literature. This application is very CPU-intensive, requiring High Performace Computing (HPC). Due to the amount of execution tasks, it can be quite slow. Those tasks are repetitive and consist in mining the information from large sets of scientific documents, a process where the time-cost of the application could be improved through paralellization. This paper presents a parallel version of Biblio-MetReS. The multithreading application P(arallel)-Biblio-MetReS distributes the work among copies of the same Java class, each mining a collection of documents obtained in a previous search phase from different literature sources of Internet. In this article, we compare performances between the parallel and non-parallel versions of the application and discuss scalability issues on multi-threading systems in the context of this application. Furthermore, we also optimize memory management and reutilization of document parsing results. Our experimental results corroborate the good performance of P-Biblio-MetReS, pinpointing specific aspects that still need to be improved.