Proceedings of the NATO Advanced Research Workshop on Neural computers
A Bayesian approach to on-line learning
On-line learning in neural networks
Optimal perceptron learning: as online Bayesian approach
On-line learning in neural networks
Concept decompositions for large sparse text data using clustering
Machine Learning
MobiHoc '01 Proceedings of the 2nd ACM international symposium on Mobile ad hoc networking & computing
Bayesian online classifiers for text classification and filtering
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Centroid-Based Document Classification: Analysis and Experimental Results
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Document classification through interactive supervision of document and term labels
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Generative model-based document clustering: a comparative study
Knowledge and Information Systems
Web Prediction Using Online Support Vector Machine
ICTAI '05 Proceedings of the 17th IEEE International Conference on Tools with Artificial Intelligence
Relaxed online SVMs for spam filtering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical document classification using automatically generated hierarchy
Journal of Intelligent Information Systems
EECHE: energy-efficient cluster head election protocol for heterogeneous wireless sensor networks
Proceedings of the International Conference on Advances in Computing, Communication and Control
A class-feature-centroid classifier for text categorization
Proceedings of the 18th international conference on World wide web
The huller: a simple and efficient online SVM
ECML'05 Proceedings of the 16th European conference on Machine Learning
Hi-index | 0.00 |
We present a Life-Long Learning from Mistakes (3LM) algorithm for document classification, which could be used in various scenarios such as spam filtering, blog classification, and web resource categorization. We extend the ideas of online clustering and batch-mode centroid-based classification to online learning with negative feedback. The 3LM is a competitive learning algorithm, which avoids over-smoothing, characteristic of the centroid-based classifiers, by using a different class representative, which we call clusterhead. The clusterheads competing for vector-space dominance are drawn toward misclassified documents, eventually bringing the model to a ''balanced state'' for a fixed distribution of documents. Subsequently, the clusterheads oscillate between the misclassified documents, heuristically minimizing the rate of misclassifications, an NP-complete problem. Further, the 3LM algorithm prevents over-fitting by ''leashing'' the clusterheads to their respective centroids. A clusterhead provably converges if its class can be separated by a hyper-plane from all other classes. Lifelong learning with fixed learning rate allows 3LM to adapt to possibly changing distribution of the data and continually learn and unlearn document classes. We report on our experiments, which demonstrate high accuracy of document classification on Reuters21578, OHSUMED, and TREC07p-spam datasets. The 3LM algorithm did not show over-fitting, while consistently outperforming centroid-based, Naive Bayes, C4.5, AdaBoost, kNN, and SVM whose accuracy had been reported on the same three corpora.