Improving Short-Text Classification using Unlabeled Data for Classification Problems
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Automatic web query classification using labeled and unlabeled training data
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Finding advertising keywords on web pages
Proceedings of the 15th international conference on World Wide Web
A web-based kernel function for measuring the similarity of short text snippets
Proceedings of the 15th international conference on World Wide Web
Measuring semantic similarity between words using web search engines
Proceedings of the 16th international conference on World Wide Web
Clustering short texts using wikipedia
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 17th international conference on World Wide Web
Query dependent pseudo-relevance feedback based on wikipedia
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Improving similarity measures for short segments of text
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Short text classification in twitter to improve information filtering
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Short-Text classification based on ICA and LSA
ISNN'06 Proceedings of the Third international conference on Advnaces in Neural Networks - Volume Part II
Semantic Pattern Tree Kernels for Short-Text Classification
DASC '11 Proceedings of the 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing
Improving retrieval of short texts through document expansion
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
In Web2.0 applications, lots of the texts provided by users are as short as 3 to 10 words. A good classification against the short texts can help the readers find needed messages more quickly. In this paper, we proposed a method to expand the short texts with the help of public search engines through two steps. First we searched the short text in a public search engine and crawled the result pages. Secondly we regarded the texts in result pages as some background knowledge of the original short text, and extracted the feature vector from them. Therefore we can choose a proper number of the result pages to obtain enough corpuses for feature vector extraction to solve the data sparseness problem. We conducted some experiments under different situations and the empirical results indicated that this enriched representation of short texts can substantially improve the classification effects.