Generic topic segmentation of document texts
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
The Journal of Machine Learning Research
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Improved automatic keyword extraction given more linguistic knowledge
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
BuzzTrack: topic detection and tracking in email
Proceedings of the 12th international conference on Intelligent user interfaces
Discovering key concepts in verbose queries
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Emerging topic detection on Twitter based on temporal and social terms evaluation
Proceedings of the Tenth International Workshop on Multimedia Data Mining
Eddi: interactive topic-based browsing of social status streams
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
Topic detection and organization of mobile text messages
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Breaking News Detection and Tracking in Twitter
WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Clustering weblogs on the basis of a topic detection method
MCPR'10 Proceedings of the 2nd Mexican conference on Pattern recognition: Advances in pattern recognition
Hi-index | 0.00 |
Detecting a suitable topic label for short texts, e.g., tweets from Twitter, is an important component in many applications including diversity ranking, clustering, information retrieval, and information filtering. To automatically detect topic labels however is a major challenge. The character limit of a short text means the lack of a significant feature space to adequately describe its content in relation to other short texts in a given collection. Therefore, methods like LDA, TF-IDF or similarity measures all fail due to their sensitivity to a small feature space. And when a collection of related short texts are considered, e.g., from a Twitter search, the result set collectively exhibits sparsity and high dimensionality -- a nightmare for information processing. A solution to this problem is to expand the feature space through a process known as pseudo-relevance feedback. Unfortunately, they disappoint when subjected to real-world conditions. The fundamental problem lie in the level of noise present in both the short texts and the feedback source, which is often the World Wide Web. We propose a novel pseudo-relevance feedback algorithm to accurately identify topic labels for short texts. Our algorithm robustly handles noise in both the short texts and the feedback source through a method called 'feature matching'. Empirical results confirm the efficacy of our algorithm.