Improving short text classification using public search engines

Authors:
Wang Meng;Lin Lanfen;Wang Jing;Yu Penghua;Liu Jiaolong;Xie Fei
Affiliations:
College of Computer Science and Technology, Zhejiang University, HangZhou, China;College of Computer Science and Technology, Zhejiang University, HangZhou, China;College of Computer Science and Technology, Zhejiang University, HangZhou, China;College of Computer Science and Technology, Zhejiang University, HangZhou, China;College of Computer Science and Technology, Zhejiang University, HangZhou, China;College of Computer Science and Technology, Zhejiang University, HangZhou, China
Venue:
IUKM'13 Proceedings of the 2013 international conference on Integrated Uncertainty in Knowledge Modelling and Decision Making
Year:
2013

Citing 13
Cited 0

Improving Short-Text Classification using Unlabeled Data for Classification Problems

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Automatic web query classification using labeled and unlabeled training data

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Finding advertising keywords on web pages

Proceedings of the 15th international conference on World Wide Web
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Clustering short texts using wikipedia

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Proceedings of the 17th international conference on World Wide Web
Query dependent pseudo-relevance feedback based on wikipedia

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Improving similarity measures for short segments of text

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Short text classification in twitter to improve information filtering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Short-Text classification based on ICA and LSA

ISNN'06 Proceedings of the Third international conference on Advnaces in Neural Networks - Volume Part II
Semantic Pattern Tree Kernels for Short-Text Classification

DASC '11 Proceedings of the 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing
Improving retrieval of short texts through document expansion

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In Web2.0 applications, lots of the texts provided by users are as short as 3 to 10 words. A good classification against the short texts can help the readers find needed messages more quickly. In this paper, we proposed a method to expand the short texts with the help of public search engines through two steps. First we searched the short text in a public search engine and crawled the result pages. Secondly we regarded the texts in result pages as some background knowledge of the original short text, and extracted the feature vector from them. Therefore we can choose a proper number of the result pages to obtain enough corpuses for feature vector extraction to solve the data sparseness problem. We conducted some experiments under different situations and the empirical results indicated that this enriched representation of short texts can substantially improve the classification effects.