The nature of statistical learning theory
The nature of statistical learning theory
Learning routing queries in a query zone
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
A vector space model for automatic indexing
Communications of the ACM
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Training Invariant Support Vector Machines
Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Incorporating Invariances in Support Vector Learning Machines
ICANN 96 Proceedings of the 1996 International Conference on Artificial Neural Networks
AdaBoosting Neural Networks: Application to on-line Character Recognition
ICANN '97 Proceedings of the 7th International Conference on Artificial Neural Networks
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Recognition and Structure from one 2D Model View: Observations on Prototypes, Object Classes and Symmetries
Virtual examples for text classification with Support Vector Machines
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
A PAC-Style model for learning from labeled and unlabeled data
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Incorporating topical support documents into a small training set in text categorization
Proceedings of the 17th ACM conference on Information and knowledge management
Developing a semantic-enable information retrieval mechanism
Expert Systems with Applications: An International Journal
Automatic text categorization based on content analysis with cognitive situation models
Information Sciences: an International Journal
A global-ranking local feature selection method for text categorization
Expert Systems with Applications: An International Journal
A generalized cluster centroid based classifier for text categorization
Information Processing and Management: an International Journal
Hi-index | 0.00 |
This paper explores the incorporation of prior knowledge into support vector machines as a means of compensating for a shortage of training data in text categorization. The prior knowledge about transformation invariance is generated by a virtual document method. The method applies a simple transformation to documents, i.e., making virtual documents by combining relevant document pairs for a topic in the training set. The virtual document thus created not only is expected to preserve the topic, but even improve the topical representation by exploiting relevant terms that are not given high importance in individual real documents. Artificially generated documents result in the change in the distribution of training data without the randomization. Experiments with support vector machines based on linear, polynomial and radial-basis function kernels showed the effectiveness on Reuters-21578 set for the topics with a small number of relevant documents. The proposed method achieved 131%, 34%, 12% improvements in micro-averaged F"1 for 25, 46, and 58 topics with less than 10, 30, and 50 relevant documents in learning, respectively. The result analysis indicates that incorporating virtual documents contributes to a steady improvement on the performance.