Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A system for automatic personalized tracking of scientific literature on the Web
Proceedings of the fourth ACM conference on Digital libraries
Topic Detection and Tracking: Event-Based Information Organization
Topic Detection and Tracking: Event-Based Information Organization
Bursty and hierarchical structure in streams
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ThemeRiver: Visualizing Theme Changes over Time
INFOVIS '00 Proceedings of the IEEE Symposium on Information Vizualization 2000
Newsjunkie: providing personalized newsfeeds via analysis of information novelty
Proceedings of the 13th international conference on World Wide Web
Information diffusion through blogspace
Proceedings of the 13th international conference on World Wide Web
The political blogosphere and the 2004 U.S. election: divided they blog
Proceedings of the 3rd international workshop on Link discovery
HT06, tagging paper, taxonomy, Flickr, academic article, to read
Proceedings of the seventeenth conference on Hypertext and hypermedia
Google news personalization: scalable online collaborative filtering
Proceedings of the 16th international conference on World Wide Web
ACM Transactions on the Web (TWEB)
Finding similar files in a large file system
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Proceedings of the 17th international conference on World Wide Web
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Near real time information mining in multilingual news
Proceedings of the 18th international conference on World wide web
Meme-tracking and the dynamics of the news cycle
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient methods for topic model inference on streaming document collections
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Earthquake shakes Twitter users: real-time event detection by social sensors
Proceedings of the 19th international conference on World wide web
Connecting the dots between news articles
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Patterns of temporal variation in online media
Proceedings of the fourth ACM international conference on Web search and data mining
Unified analysis of streaming news
Proceedings of the 20th international conference on World wide web
Visual memes in social media: tracking real-world news in YouTube videos
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Mining of Massive Datasets
Lydia: a system for large-scale news analysis
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Trains of thought: generating information maps
Proceedings of the 21st international conference on World Wide Web
Your two weeks of fame and your grandmother's
Proceedings of the 21st international conference on World Wide Web
You had me at hello: how phrasing affects memorability
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Finding progression stages in time-evolving event sequences
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
The real-time information on news sites, blogs and social networking sites changes dynamically and spreads rapidly through the Web. Developing methods for handling such information at a massive scale requires that we think about how information content varies over time, how it is transmitted, and how it mutates as it spreads. We describe the News Information Flow Tracking, Yay! (NIFTY) system for large scale real-time tracking of "memes" - short textual phrases that travel and mutate through the Web. NIFTY is based on a novel highly-scalable incremental meme-clustering algorithm that efficiently extracts and identifies mutational variants of a single meme. NIFTY runs orders of magnitude faster than our previous Memetracker system, while also maintaining better consistency and quality of extracted memes. We demonstrate the effectiveness of our approach by processing a 20 terabyte dataset of 6.1 billion blog posts and news articles that we have been continuously collecting for the last four years. NIFTY extracted 2.9 billion unique textual phrases and identified more than 9 million memes. Our meme-tracking algorithm was able to process the entire dataset in less than five days using a single machine. Furthermore, we also provide a live deployment of the NIFTY system that allows users to explore the dynamics of online news in near real-time.