Multi-view Semi-supervised Learning: An Approach to Obtain Different Views from Text Datasets

  • Authors:
  • Edson Takashi Matsubara;Maria Carolina Monard;Gustavo E. A. P. A. Batista

  • Affiliations:
  • University of São Paulo --USP, Institute of Mathematics and Computer Science --ICMC, Laboratory of Computational Intelligence --LABIC, P.O. Box 668, 13560-970, São Carlos, SP, Brazil, {e ...;University of São Paulo --USP, Institute of Mathematics and Computer Science --ICMC, Laboratory of Computational Intelligence --LABIC, P.O. Box 668, 13560-970, São Carlos, SP, Brazil, {e ...;University of São Paulo --USP, Institute of Mathematics and Computer Science --ICMC, Laboratory of Computational Intelligence --LABIC, P.O. Box 668, 13560-970, São Carlos, SP, Brazil, {e ...

  • Venue:
  • Proceedings of the 2005 conference on Advances in Logic Based Intelligent Systems: Selected Papers of LAPTEC 2005
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The supervised machine learning approach usually requires a large number of labelled examples to learn accurately. However, labelling can be a costly and time consuming process, especially when manually performed. In contrast, unlabelled examples are usually inexpensive and easy to obtain. This is the case for text classification tasks involving on-line data sources, such as web pages, email and scientific papers. Semi-supervised learning, a relatively new area in machine learning, represents a blend of supervised and unsupervised learning, and has the potential of reducing the need of expensive labelled data whenever only a small set of labelled examples is available. Multi-view semi-supervised learning requires a partitioned description of each example into at least two distinct views. In this work, we propose a simple approach for textual documents pre-processing in order to easily construct the two different views required by any multi-view learning algorithm. Experimental results related to text classification are described, suggesting that our proposal to construct the views performs well in practice.