Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
Maximizing the spread of influence through a social network
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 17th international conference on World Wide Web
Random sampling from a search engine's index
Journal of the ACM (JACM)
Maximizing submodular set functions subject to multiple linear constraints
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Foundations and Trends in Information Retrieval
Earthquake shakes Twitter users: real-time event detection by social sensors
Proceedings of the 19th international conference on World wide web
TwitterMonitor: trend detection over the twitter stream
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Hashtag retrieval in a microblogging environment
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
WSA '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media
Estimating sizes of social networks via biased sampling
Proceedings of the 20th international conference on World wide web
Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach
Proceedings of the 20th ACM international conference on Information and knowledge management
TwitterEcho: a distributed focused crawler to support open research with twitter data
Proceedings of the 21st international conference companion on World Wide Web
A note on maximizing a submodular set function subject to a knapsack constraint
Operations Research Letters
TEDAS: A Twitter-based Event Detection and Analysis System
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Hi-index | 0.00 |
Many novel applications have been built based on analyzing tweets about specific topics. While these applications provide different kinds of analysis, they share a common task of monitoring "target" tweets from the Twitter stream for a topic. The current solution for this task tracks a set of manually selected keywords with Twitter APIs. Obviously, this manual approach has many limitations. In this paper, we propose a data platform to automatically monitor target tweets from the Twitter stream for any given topic. To monitor target tweets in an optimal and continuous way, we design Automatic Topic-focused Monitor (ATM), which iteratively 1) samples tweets from the stream and 2) selects keywords to track based on the samples. To realize ATM, we develop a tweet sampling algorithm to sample sufficient unbiased tweets with available Twitter APIs, and a keyword selection algorithm to efficiently select keywords that have a near-optimal coverage of target tweets under cost constraints. We conduct extensive experiments to show the effectiveness of ATM. E.g., ATM covers 90% of target tweets for a topic and improves the manual approach by 49%.