Searching for quality microblog posts: filtering and ranking based on content analysis and implicit links

Authors:
Jan Vosecky;Kenneth Wai-Ting Leung;Wilfred Ng
Affiliations:
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China;Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China;Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China
Venue:
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Year:
2012

Citing 12
Cited 2

Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Why we twitter: understanding microblogging usage and communities

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
How and why people Twitter: the role that micro-blogging plays in informal communication at work

Proceedings of the ACM 2009 international conference on Supporting group work
Ranking mechanisms in twitter-like forums

Proceedings of the third ACM international conference on Web search and data mining
Short text classification in twitter to improve information filtering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
You are where you tweet: a content-based approach to geo-locating twitter users

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
An empirical study on learning to rank of tweets

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Ranking Approaches for Microblog Search

WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Informality judgment at sentence level and experiments with formality score

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II

Effectiveness of state-of-the-art features for microblog search

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Classifying microblogs for disasters

Proceedings of the 18th Australasian Document Computing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today, social networking has become a popular web activity, with a large amount of information created by millions of people every day. However, the study on effective searching of such social information is still in its infancy. In this paper, we focus on Twitter, a rapidly growing microblogging platform, which provides a large amount, diversity and varying quality of content. In order to provide higher quality content (e.g. posts mentioning news, events, useful facts or well-formed opinions) when a user searches for tweets on Twitter, we propose a new method to filter and rank tweets according to their quality. In order to model the quality of tweets, we devise a new set of link-based features, in addition to content-based features. We examine the implicit links between tweets, URLs, hashtags and users, and then propose novel metrics to reflect the popularity as well as quality-based reputation of websites, hashtags and users. We then evaluate both the content-based and link-based features in terms of classification effectiveness and identify an optimal feature subset that achieves the best classification accuracy. A detailed evaluation of our filtering and ranking models shows that the optimal feature subset outperforms traditional bag-of-words representation, while requiring significantly less computational time and storage. Moreover, we demonstrate that the proposed metrics based on implicit links are effective for determining tweets' quality.