On Text-based Mining with Active Learning and Background Knowledge Using SVM

Authors:
Catarina Silva;Bernardete Ribeiro
Affiliations:
Informática – Universidade de Coimbra, CISUC – Departamento de Engenharia, Coimbra, Portugal and Instituto Politécnico de Leiria, Escola Superior de Tecnologia e Gest&# ...;Informática – Universidade de Coimbra, CISUC – Departamento de Engenharia, Coimbra, Portugal
Venue:
Soft Computing - A Fusion of Foundations, Methodologies and Applications
Year:
2007

Citing 0
Cited 7

Classification of Protein Interaction Sentences via Gaussian Processes

PRIB '09 Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics
Improving Text Classification Performance with Incremental Background Knowledge

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Distributed text classification with an ensemble kernel-based learning approach

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Protein interaction detection in sentences via Gaussian Processes: a preliminary evaluation

International Journal of Data Mining and Bioinformatics
The importance of precision in humour classification

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
Purging false negatives in cancer diagnosis using incremental active learning

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
Get your jokes right: ask the crowd

MEDI'11 Proceedings of the First international conference on Model and data engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text mining, intelligent text analysis, text data mining and knowledge-discovery in text are generally used aliases to the process of extracting relevant and non-trivial information from text. Some crucial issues arise when trying to solve this problem, such as document representation and deficit of labeled data. This paper addresses these problems by introducing information from unlabeled documents in the training set, using the support vector machine (SVM) separating margin as the differentiating factor. Besides studying the influence of several pre-processing methods and concluding on their relative significance, we also evaluate the benefits of introducing background knowledge in a SVM text classifier. We further evaluate the possibility of actively learning and propose a method for successfully combining background knowledge and active learning. Experimental results show that the proposed techniques, when used alone or combined, present a considerable improvement in classification performance, even when small labeled training sets are available.