Exploiting structural information for semi-structured document categorization
Information Processing and Management: an International Journal
Pragmatic text mining: minimizing human effort to quantify many issues in call logs
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anomaly-based fault detection in pervasive computing system
Proceedings of the 5th international conference on Pervasive services
On the Effects of Learning Set Corruption in Anomaly-Based Detection of Web Defacements
DIMVA '07 Proceedings of the 4th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Case-Sensitivity of Classifiers for WSD: Complex Systems Disambiguate Tough Words Better
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
A fast decision tree learning algorithm
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
OE: WSD using optimal ensembling (OE) method
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Exploiting structural information for semi-structured document categorization
Information Processing and Management: an International Journal
Computers and Electronics in Agriculture
Defining classifier regions for WSD ensembles using word space features
MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
A propositional approach to textual case indexing
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Cost-sensitive classification with inadequate labeled data
Information Systems
The Journal of Supercomputing
Learning to classify service data with latent semantics
RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
RssE-Miner: a new approach for efficient events mining from social media RSS feeds
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Discrete-Time hopfield neural network based text clustering algorithm
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I
Feature words that classify problem sentence in scientific article
Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Hi-index | 0.00 |
Many real-world machine learning tasks are faced with the problem of small training sets. Additionally, the class distribution of the training set often does not match the target distribution. In this paper we compare the performance of many learning models on a substantial benchmark of binary text classification tasks having small training sets. We vary the training size and class distribution to examine the learning surface, as opposed to the traditional learning curve. The models tested include various feature selection methods each coupled with four learning algorithms: Support Vector Machines (SVM), Logistic Regression, Naive Bayes, and Multinomial Naive Bayes. Different models excel in different regions of the learning surface, leading to meta-knowledge about which to apply in different situations. This helps guide the researcher and practitioner when facing choices of model and feature selection methods in, for example, information retrieval settings and others.