BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
The string-to-string correction problem with block moves
ACM Transactions on Computer Systems (TOCS)
Journal of Algorithms
Text classification using string kernels
The Journal of Machine Learning Research
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Document-Base Extraction for Single-Label Text Classification
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Hi-index | 0.00 |
Due to the ubiquity of textual information nowadays and the multi-topic nature of text, it is of great necessity to explore multi-label text categorization problem. Traditional methods based on vector-space-model text representation suffer the losing of word order information. In this paper, texts are considered as symbol sequences. A multi-label lazy learning approach named kNN-M is proposed, which is derived from traditional k-nearest neighbor (kNN) method. The flexible order-semisensitive measure, M-Similarity, which enables the usage of sequence information in text by swap-allowed dynamic block matching, is applied to evaluate the closeness of texts on finding k-nearest neighbors in kNN-M. Experiments on real-world OHSUMED datasets illustrate that our approach outperforms existing ones considerably, showing the power of considering both term co-occurrence and order on text categorization tasks.