Multi-label text categorization using k-nearest neighbor approach with m-similarity

  • Authors:
  • Yi Feng;Zhaohui Wu;Zhongmei Zhou

  • Affiliations:
  • College of Computer Science, Zhejiang University, Hangzhou, P.R. China;College of Computer Science, Zhejiang University, Hangzhou, P.R. China;College of Computer Science, Zhejiang University, Hangzhou, P.R. China

  • Venue:
  • SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to the ubiquity of textual information nowadays and the multi-topic nature of text, it is of great necessity to explore multi-label text categorization problem. Traditional methods based on vector-space-model text representation suffer the losing of word order information. In this paper, texts are considered as symbol sequences. A multi-label lazy learning approach named kNN-M is proposed, which is derived from traditional k-nearest neighbor (kNN) method. The flexible order-semisensitive measure, M-Similarity, which enables the usage of sequence information in text by swap-allowed dynamic block matching, is applied to evaluate the closeness of texts on finding k-nearest neighbors in kNN-M. Experiments on real-world OHSUMED datasets illustrate that our approach outperforms existing ones considerably, showing the power of considering both term co-occurrence and order on text categorization tasks.