Distributed classification of textual documents on the grid

  • Authors:
  • Ivan Janciak;Martin Sarnovsky;A Min Tjoa;Peter Brezany

  • Affiliations:
  • Institute of Scientific Computing, University of Vienna, Vienna, Austria;Department of Cybernetics and Artificial Intelligence, Technical University of Kosice, Kosice, Slovakia;Institute of Software Technology and Interactive Systems, Vienna University of Technology, Vienna, Austria;Institute of Scientific Computing, University of Vienna, Vienna, Austria

  • Venue:
  • HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Efficient access to information and integration of information from various sources and leveraging this information to knowledge are currently major challenges in life science research. However, a large fraction of this information is only available from scientific articles that are stored in huge document databases in free text format or from the Web, where it is available in semi-structured format. Text mining provides some methods (e.g., classification, clustering, etc.) able to automatically extract relevant knowledge patterns contained in the free text data. The inclusion of the Grid text-mining services into a Grid-based knowledge discovery system can significantly support problem solving processes based on such a system. Motivation for the research effort presented in this paper is to use the Grid computational, storage, and data access capabilities for text mining tasks and text classification in particular. Text classification mining methods are time-consuming and utilizing the Grid infrastructure can bring significant benefits. Implementation of text mining techniques in distributed environment allows us to access different geographically distributed data collections and perform text mining tasks in parallel/distributed fashion.