Automatic analysis, theme generation, and summarization of machine-readable texts
Readings in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
High-performing feature selection for text classification
Proceedings of the eleventh international conference on Information and knowledge management
Text Categorization Based on Regularized Linear Classification Methods
Information Retrieval
On Clustering Validation Techniques
Journal of Intelligent Information Systems
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Topic-conditioned novelty detection
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Comparing clusterings---an information based distance
Journal of Multivariate Analysis
On exploiting the power of time in data mining
ACM SIGKDD Explorations Newsletter
Ensemble Learning: A Study on Different Variants of the Dynamic Selection Approach
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Open-source machine learning: R meets Weka
Computational Statistics - Proceedings of DSC 2007
Discretizing continuous attributes in AdaBoost for text categorization
ECIR'03 Proceedings of the 25th European conference on IR research
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Modern Applied Statistics with S
Modern Applied Statistics with S
Hi-index | 0.00 |
Text Categorization (TC) has attracted the attention of the research community in the last decade. Algorithms like Support Vector Machines, Naïve Bayes or k Nearest Neighbors have been used with good performance, confirmed by several comparative studies. Recently, several ensemble classifiers were also introduced in TC. However, many of those can only provide a category for a given new sample. Instead, in this paper, we propose a methodology --- MECAC --- to build an ensemble of classifiers that has two advantages to other ensemble methods: 1) it can be run using parallel computing, saving processing time and 2) it can extract important statistics from the obtained clusters. It uses the mean co-association matrix to solve binary TC problems. Our experiments revealed that our framework performed, on average, 2.04% better than the best individual classifier on the tested datasets. These results were statistically validated for a significance level of 0.05 using the Friedman Test.