Distributed text classification with an ensemble kernel-based learning approach

Authors:
Catarina Silva;Uroš Lotrič;Bernardete Ribeiro;Andrej Dobnikar
Affiliations:
Department of Informatics Engineering, School of Technology and Management, Polytechnic Institute of Leiria, Leiria, Portugal and Center of Informatics and Systems, University of Coimbra, Coimbra, ...;Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia;Department of Informatics Engineering, Center of Informatics and Systems, University of Coimbra, Coimbra, Portugal;Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Year:
2010

Citing 23
Cited 3

Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
The nature of statistical learning theory

The nature of statistical learning theory
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering

Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Information Retrieval

Information Retrieval
Machine Learning

Machine Learning
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Automatic Text Categorization and Its Application to Text Retrieval

IEEE Transactions on Knowledge and Data Engineering
Support Vector Machines

IEEE Intelligent Systems
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
Combining Pattern Classifiers: Methods and Algorithms

Combining Pattern Classifiers: Methods and Algorithms
The Grid 2: Blueprint for a New Computing Infrastructure

The Grid 2: Blueprint for a New Computing Infrastructure
Sparse Bayesian Learning for Efficient Visual Tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence
Speeding-up Text Categorization in a GRID Computing Environment

ICMLA '05 Proceedings of the Fourth International Conference on Machine Learning and Applications
On Text-based Mining with Active Learning and Background Knowledge Using SVM

Soft Computing - A Fusion of Foundations, Methodologies and Applications
Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP
Nursing-Care Freestyle Text Classification Using Support Vector Machines

GRC '07 Proceedings of the 2007 IEEE International Conference on Granular Computing
An ant colony optimization approach to a grid workflow scheduling problem with various QoS requirements

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Sparse Bayesian classification of predicate arguments

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Support vector machines for quality monitoring in a plastic injection molding process

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Mercatus: A Toolkit for the Simulation of Market-Based Resource Allocation Protocols in Grids

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A Complete Multiagent Framework for Robust and Adaptable Dynamic Job Shop Scheduling

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
No free lunch theorems for optimization

IEEE Transactions on Evolutionary Computation

High performance genetic algorithm based text clustering using parts of speech and outlier elimination

Applied Intelligence
A Lattice-Computing ensemble for reasoning based on formal fusion of disparate data types, and an industrial dispensing application

Information Fusion
A MapReduce-based distributed SVM ensemble for scalable image classification and annotation

Computers & Mathematics with Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Constructing a single text classifier that excels in any given application is a rather inviable goal. As a result, ensemble systems are becoming an important resource, since they permit the use of simpler classifiers and the integration of different knowledge in the learning process. However, many text-classification ensemble approaches have an extremely high computational burden, which poses limitations in applications in real environments. Moreover, state-of-the-art kernel-based classifiers, such as support vector machines and relevance vector machines, demand large resources when applied to large databases. Therefore, we propose the use of a new systematic distributed ensemble framework to tackle these challenges, based on a generic deployment strategy in a cluster distributed environment. We employ a combination of both task and data decomposition of the text-classification system, based on partitioning, communication, agglomeration, and mapping to define and optimize a graph of dependent tasks. Additionally, the framework includes an ensemble system where we exploit diverse patterns of errors and gain from the synergies between the ensemble classifiers. The ensemble data partitioning strategy used is shown to improve the performance of baseline state-of-the-art kernel-based machines. The experimental results show that the performance of the proposed framework outperforms standard methods both in speed and classification.