Content-based crowd retrieval on the real-time web

Authors:
Krishna Y. Kamath;James Caverlee
Affiliations:
Texas A&M University, College Station, TX, USA;Texas A&M University, College Station, TX, USA
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 22
Cited 0

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Unsupervised and supervised clustering for topic tracking

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Introduction to topic detection and tracking

Topic detection and tracking
Latent dirichlet allocation

The Journal of Machine Learning Research
A fast kernel-based multilevel algorithm for graph clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
StatStream: statistical monitoring of thousands of data streams in real time

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Introduction to Information Retrieval

Introduction to Information Retrieval
Stop Chasing Trends: Discovering High Order Models in Evolving Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A Framework for Clustering Massive-Domain Data Streams

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Clustering over Evolving Data Streams Based on Online Recent-Biased Approximation

Knowledge Acquisition: Approaches, Algorithms and Applications
Earthquake shakes Twitter users: real-time event detection by social sensors

Proceedings of the 19th international conference on World wide web
PET: a statistical model for popular events tracking in social communities

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Streaming first story detection with application to Twitter

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Text stream clustering algorithm based on adaptive feature selection

Expert Systems with Applications: An International Journal
Discovering Overlapping Groups in Social Media

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Transient crowd discovery on the real-time social web

Proceedings of the fourth ACM international conference on Web search and data mining
Who says what to whom on twitter

Proceedings of the 20th international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose and evaluate a novel content-driven crowd discovery algorithm that can efficiently identify newly-formed communities of users from the real-time web. Short-lived crowds reflect the real-time interests of their constituents and provide a foundation for user-focused web monitoring. Three of the salient features of the algorithm are its: (i) prefix-tree based locality-sensitive hashing approach for discovering crowds from high-volume rapidly-evolving social media; (ii) efficient user profile updating for incorporating new user activities and fading older ones; and (iii) key dimension identification, so that crowd detection can be focused on the most active portions of the real-time web. Through extensive experimental study, we find significantly more efficient crowd discovery as compared to both a k-means clustering-based approach and a MapReduce-based implementation, while maintaining high-quality crowds as compared to an offline approach. Additionally, we find that expert crowds tend to be "stickier" and last longer in comparison to crowds of typical users.