Distributed classification of textual documents on the grid

Authors:
Ivan Janciak;Martin Sarnovsky;A Min Tjoa;Peter Brezany
Affiliations:
Institute of Scientific Computing, University of Vienna, Vienna, Austria;Department of Cybernetics and Artificial Intelligence, Technical University of Kosice, Kosice, Slovakia;Institute of Software Technology and Interactive Systems, Vienna University of Technology, Vienna, Austria;Institute of Scientific Computing, University of Vienna, Vienna, Austria
Venue:
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Year:
2006

Citing 6
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Towards language independent automated learning of text categorization models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Learning first-order definitions of functions

Journal of Artificial Intelligence Research
Learning first-order definitions of functions

Journal of Artificial Intelligence Research
A statistical approach to mechanized encoding and searching of literary information

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient access to information and integration of information from various sources and leveraging this information to knowledge are currently major challenges in life science research. However, a large fraction of this information is only available from scientific articles that are stored in huge document databases in free text format or from the Web, where it is available in semi-structured format. Text mining provides some methods (e.g., classification, clustering, etc.) able to automatically extract relevant knowledge patterns contained in the free text data. The inclusion of the Grid text-mining services into a Grid-based knowledge discovery system can significantly support problem solving processes based on such a system. Motivation for the research effort presented in this paper is to use the Grid computational, storage, and data access capabilities for text mining tasks and text classification in particular. Text classification mining methods are time-consuming and utilizing the Grid infrastructure can bring significant benefits. Implementation of text mining techniques in distributed environment allows us to access different geographically distributed data collections and perform text mining tasks in parallel/distributed fashion.