Improving short text classification using public search engines
IUKM'13 Proceedings of the 2013 international conference on Integrated Uncertainty in Knowledge Modelling and Decision Making
Hi-index | 0.00 |
Kernel methods are widely used for document classification in diverse domains. Popular kernels such as bag-of-word kernels and tree kernels show satisfactory results in classifying documents such as articles, e-mails or web pages. However, they provide less satisfactory performances in classifying short-text documents since the short documents have insufficient feature space. In order to cope with the problem, this paper presents a novel kernel function called semantic pattern tree kernel for classifying short-text documents. The proposed kernel extends the feature space of each document by incorporating syntactic and semantic information using three levels of semantic annotations. Experiments on the Open Directory Project dataset show that in classifying short-text documents the semantic pattern tree kernels achieve higher accuracy than the conventional kernels.