A document is known by the company it keeps: neighborhood consensus for short text categorization
Language Resources and Evaluation
Distributional term representations for short-text categorization
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Hi-index | 0.00 |
this paper presents a new model for classifying Chinese short-text that have weak concept signal, in which three key factors on feature extension, which would determine the classification performance of short-text, are considered. For the sake of determining the three extension factors, this paper studied the three key issues as follows: (1) how we do feature extension for short-text; (2) what influence the different ways of feature extension do to classification performance of short-text; (3) how we control the degree of feature extension for short text. In the stage of classification, a short-text is first extended by adding new features or modifying the weights of initial features according to the relationship between non-feature terms and feature extension mode; meanwhile, we would improve the effect of feature extension by controlling the degree of feature extension, and then classify the extended short-text with the new model. The experimental results show that the new model proposed for short-text classification considering feature extension can obtain higher classification performance comparing with the conventional classification methods.