Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Unsupervised and supervised clustering for topic tracking
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Introduction to topic detection and tracking
Topic detection and tracking
The Journal of Machine Learning Research
A fast kernel-based multilevel algorithm for graph clustering
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
StatStream: statistical monitoring of thousands of data streams in real time
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Introduction to Information Retrieval
Introduction to Information Retrieval
Stop Chasing Trends: Discovering High Order Models in Evolving Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A Framework for Clustering Massive-Domain Data Streams
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Clustering over Evolving Data Streams Based on Online Recent-Biased Approximation
Knowledge Acquisition: Approaches, Algorithms and Applications
Earthquake shakes Twitter users: real-time event detection by social sensors
Proceedings of the 19th international conference on World wide web
PET: a statistical model for popular events tracking in social communities
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Streaming first story detection with application to Twitter
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Text stream clustering algorithm based on adaptive feature selection
Expert Systems with Applications: An International Journal
Discovering Overlapping Groups in Social Media
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Transient crowd discovery on the real-time social web
Proceedings of the fourth ACM international conference on Web search and data mining
Who says what to whom on twitter
Proceedings of the 20th international conference on World wide web
Hi-index | 0.00 |
In this paper, we propose and evaluate a novel content-driven crowd discovery algorithm that can efficiently identify newly-formed communities of users from the real-time web. Short-lived crowds reflect the real-time interests of their constituents and provide a foundation for user-focused web monitoring. Three of the salient features of the algorithm are its: (i) prefix-tree based locality-sensitive hashing approach for discovering crowds from high-volume rapidly-evolving social media; (ii) efficient user profile updating for incorporating new user activities and fading older ones; and (iii) key dimension identification, so that crowd detection can be focused on the most active portions of the real-time web. Through extensive experimental study, we find significantly more efficient crowd discovery as compared to both a k-means clustering-based approach and a MapReduce-based implementation, while maintaining high-quality crowds as compared to an offline approach. Additionally, we find that expert crowds tend to be "stickier" and last longer in comparison to crowds of typical users.