Learning to construct knowledge bases from the World Wide Web
Artificial Intelligence - Special issue on Intelligent internet systems
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
A vector space model for automatic indexing
Communications of the ACM
Sliding-window filtering: an efficient algorithm for incremental mining
Proceedings of the tenth international conference on Information and knowledge management
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Partially Supervised Classification of Text Documents
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
PEBL: positive example based learning for Web page classification using SVM
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Support vector machine active learning with applications to text classification
The Journal of Machine Learning Research
Tagging English text with a probabilistic model
Computational Linguistics
Improving text categorization using the importance of sentences
Information Processing and Management: an International Journal
Using the feature projection technique based on a normalized voting method for text classification
Information Processing and Management: an International Journal
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Learning to classify texts using positive and unlabeled data
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A comparison of methods for multiclass support vector machines
IEEE Transactions on Neural Networks
The data complexity index to construct an efficient cross-validation method
Decision Support Systems
Hi-index | 0.00 |
Automatic text classification is the problem of automatically assigning predefined categories to free text documents, thus allowing for less manual labors required by traditional classification methods. When we apply binary classification to multi-class classification for text classification, we usually use the one-against-the-rest method. In this method, if a document belongs to a particular category, the document is regarded as a positive example of that category; otherwise, the document is regarded as a negative example. Finally, each category has a positive data set and a negative data set. But, this one-against-the-rest method has a problem. That is, the documents of a negative data set are not labeled manually, while those of a positive set are labeled by human. Therefore, the negative data set probably includes a lot of noisy data. In this paper, we propose that the sliding window technique and the revised EM (Expectation Maximization) algorithm are applied to binary text classification for solving this problem. As a result, we can improve binary text classification through extracting potentially noisy documents from the negative data set using the sliding window technique and removing actually noisy documents using the revised EM algorithm. The results of our experiments showed that our method achieved better performance than the original one-against-the-rest method in all the data sets and all the classifiers used in the experiments.