Text categorization using an ensemble classifier based on a mean co-association matrix

Authors:
Luís Moreira-Matias;João Mendes-Moreira;João Gama;Pavel Brazdil
Affiliations:
Departamento de Engenharia Informática, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal, LIAAD-INESC Porto L.A., Porto, Portugal;Departamento de Engenharia Informática, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal, LIAAD-INESC Porto L.A., Porto, Portugal;LIAAD-INESC Porto L.A., Porto, Portugal, Faculdade de Economia, Universidade do Porto, Porto, Portugal;LIAAD-INESC Porto L.A., Porto, Portugal, Faculdade de Economia, Universidade do Porto, Porto, Portugal
Venue:
MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2012

Citing 18
Cited 0

Automatic analysis, theme generation, and summarization of machine-readable texts

Readings in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
High-performing feature selection for text classification

Proceedings of the eleventh international conference on Information and knowledge management
Text Categorization Based on Regularized Linear Classification Methods

Information Retrieval
On Clustering Validation Techniques

Journal of Intelligent Information Systems
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Topic-conditioned novelty detection

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning
Comparing clusterings---an information based distance

Journal of Multivariate Analysis
On exploiting the power of time in data mining

ACM SIGKDD Explorations Newsletter
Ensemble Learning: A Study on Different Variants of the Dynamic Selection Approach

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Open-source machine learning: R meets Weka

Computational Statistics - Proceedings of DSC 2007
Discretizing continuous attributes in AdaBoost for text categorization

ECIR'03 Proceedings of the 25th European conference on IR research
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Modern Applied Statistics with S

Modern Applied Statistics with S

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text Categorization (TC) has attracted the attention of the research community in the last decade. Algorithms like Support Vector Machines, Naïve Bayes or k Nearest Neighbors have been used with good performance, confirmed by several comparative studies. Recently, several ensemble classifiers were also introduced in TC. However, many of those can only provide a category for a given new sample. Instead, in this paper, we propose a methodology --- MECAC --- to build an ensemble of classifiers that has two advantages to other ensemble methods: 1) it can be run using parallel computing, saving processing time and 2) it can extract important statistics from the obtained clusters. It uses the mean co-association matrix to solve binary TC problems. Our experiments revealed that our framework performed, on average, 2.04% better than the best individual classifier on the tested datasets. These results were statistically validated for a significance level of 0.05 using the Friedman Test.