Multi-label text categorization using k-nearest neighbor approach with m-similarity

Authors:
Yi Feng;Zhaohui Wu;Zhongmei Zhou
Affiliations:
College of Computer Science, Zhejiang University, Hangzhou, P.R. China;College of Computer Science, Zhejiang University, Hangzhou, P.R. China;College of Computer Science, Zhejiang University, Hangzhou, P.R. China
Venue:
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Year:
2005

Citing 5
Cited 1

BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
The string-to-string correction problem with block moves

ACM Transactions on Computer Systems (TOCS)
Pattern matching with swaps

Journal of Algorithms
Text classification using string kernels

The Journal of Machine Learning Research
Combining an order-semisensitive text similarity and closest fit approach to textual missing values in knowledge discovery

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II

Document-Base Extraction for Single-Label Text Classification

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the ubiquity of textual information nowadays and the multi-topic nature of text, it is of great necessity to explore multi-label text categorization problem. Traditional methods based on vector-space-model text representation suffer the losing of word order information. In this paper, texts are considered as symbol sequences. A multi-label lazy learning approach named kNN-M is proposed, which is derived from traditional k-nearest neighbor (kNN) method. The flexible order-semisensitive measure, M-Similarity, which enables the usage of sequence information in text by swap-allowed dynamic block matching, is applied to evaluate the closeness of texts on finding k-nearest neighbors in kNN-M. Experiments on real-world OHSUMED datasets illustrate that our approach outperforms existing ones considerably, showing the power of considering both term co-occurrence and order on text categorization tasks.