Classification of short texts by deploying topical annotations

Authors:
Daniele Vitale;Paolo Ferragina;Ugo Scaiella
Affiliations:
Dipartimento di Informatica, University of Pisa, Italy;Dipartimento di Informatica, University of Pisa, Italy;Dipartimento di Informatica, University of Pisa, Italy
Venue:
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Year:
2012

Citing 17
Cited 6

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Improving Short-Text Classification using Unlabeled Data for Classification Problems

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Clustering short texts using wikipedia

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Proceedings of the 17th international conference on World Wide Web
Collective annotation of Wikipedia entities in web text

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Mining meaning from Wikipedia

International Journal of Human-Computer Studies
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
Feature generation for text categorization using world knowledge

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Short text classification in twitter to improve information filtering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Towards effective short text deep classification

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Discovering context: classifying tweets through a semantic transform based on wikipedia

FAC'11 Proceedings of the 6th international conference on Foundations of augmented cognition: directing the future of adaptive systems
Robust disambiguation of named entities in text

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Graph-based collective classification for tweets

Proceedings of the 21st ACM international conference on Information and knowledge management
Using text-based web image search results clustering to minimize mobile devices wasted space-interface

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Improved text annotation with Wikipedia entities

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Harnessing linked knowledge sources for topic classification in social media

Proceedings of the 24th ACM Conference on Hypertext and Social Media
A framework for benchmarking entity-annotation systems

Proceedings of the 22nd international conference on World Wide Web
Short text classification by detecting information path

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel approach to the classification of short texts based on two factors: the use of Wikipedia-based annotators that have been recently introduced to detect the main topics present in an input text, represented via Wikipedia pages, and the design of a novel classification algorithm that measures the similarity between the input text and each output category by deploying only their annotated topics and the Wikipedia link-structure. Our approach waives the common practice of expanding the feature-space with new dimensions derived either from explicit or from latent semantic analysis. As a consequence it is simple and maintains a compact intelligible representation of the output categories. Our experiments show that it is efficient in construction and query time, accurate as state-of-the-art classifiers (see e.g. Phan et al. WWW '08), and robust with respect to concept drifts and input sources.