A new approach for semi-supervised online news classification

  • Authors:
  • Hon-Man Ko;Wai Lam

  • Affiliations:
  • Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong;Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong

  • Venue:
  • HSI'05 Proceedings of the 3rd international conference on Human Society@Internet: web and Communication Technologies and Internet-Related Social Issues
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to the dramatic increasing of information on the Web, text categorization becomes a useful tool to organize the information. Traditional text categorization problem uses a training set from online sources with pre-defined class labels for text documents. Typically a large amount of online training news should be provided in order to learn a satisfactory categorization scheme. We investigate an innovative way to alleviate the problem. For each category, only a small amount of positive training examples for a set of the major concepts associated with the category are needed. We develop a technique which makes use of unlabeled documents since those documents can be easily collected, such as online news from the Web. Our technique exploits the inherent structure in the set of positive training documents guided by the provided concepts of the category. An algorithm for training document adaptation is developed for automatically seeking representative training examples from the unlabeled data collected from the new online source. Some preliminary experiments on real-world news collection have been conducted to demonstrate the effectiveness of our approach.