Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Making large-scale support vector machine learning practical
Advances in kernel methods
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine Learning
High-performing feature selection for text classification
Proceedings of the eleventh international conference on Information and knowledge management
Text Categorization Based on Regularized Linear Classification Methods
Information Retrieval
Pairwise Classification as an Ensemble Technique
ECML '02 Proceedings of the 13th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A scalability analysis of classifiers in text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Does SVM really scale up to large bag of words feature spaces?
IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Hi-index | 0.00 |
Document classification has already been widely studied In fact, some studies compared feature selection techniques or feature space transformation whereas some others compared the performance of different algorithms Recently, following the rising interest towards the Support Vector Machine, various studies showed that the SVM outperforms other classification algorithms So should we just not bother about other classification algorithms and opt always for SVM? We have decided to investigate this issue and compared SVM to kNN and naive Bayes on binary classification tasks An important issue is to compare optimized versions of these algorithms, which is what we have done Our results show all the classifiers achieved comparable performance on most problems One surprising result is that SVM was not a clear winner, despite quite good overall performance If a suitable preprocessing is used with kNN, this algorithm continues to achieve very good results and scales up well with the number of documents, which is not the case for SVM As for naive Bayes, it also achieved good performance.