NIFTY: a system for large scale information flow tracking and clustering

Authors:
Caroline Suen;Sandy Huang;Chantat Eksombatchai;Rok Sosic;Jure Leskovec
Affiliations:
Stanford University, Stanford, USA;Stanford University, Stanford, USA;Stanford University, Stanford, USA;Stanford University, Stanford, USA;Stanford University, Stanford, USA
Venue:
Proceedings of the 22nd international conference on World Wide Web
Year:
2013

Citing 28
Cited 1

Min-wise independent permutations (extended abstract)

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A system for automatic personalized tracking of scientific literature on the Web

Proceedings of the fourth ACM conference on Digital libraries
Topic Detection and Tracking: Event-Based Information Organization

Topic Detection and Tracking: Event-Based Information Organization
Bursty and hierarchical structure in streams

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ThemeRiver: Visualizing Theme Changes over Time

INFOVIS '00 Proceedings of the IEEE Symposium on Information Vizualization 2000
Newsjunkie: providing personalized newsfeeds via analysis of information novelty

Proceedings of the 13th international conference on World Wide Web
Information diffusion through blogspace

Proceedings of the 13th international conference on World Wide Web
The political blogosphere and the 2004 U.S. election: divided they blog

Proceedings of the 3rd international workshop on Link discovery
HT06, tagging paper, taxonomy, Flickr, academic article, to read

Proceedings of the seventeenth conference on Hypertext and hypermedia
Google news personalization: scalable online collaborative filtering

Proceedings of the 16th international conference on World Wide Web
Visualizing tags over time

ACM Transactions on the Web (TWEB)
Finding similar files in a large file system

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Proceedings of the 17th international conference on World Wide Web
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Near real time information mining in multilingual news

Proceedings of the 18th international conference on World wide web
Meme-tracking and the dynamics of the news cycle

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient methods for topic model inference on streaming document collections

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Earthquake shakes Twitter users: real-time event detection by social sensors

Proceedings of the 19th international conference on World wide web
Connecting the dots between news articles

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Patterns of temporal variation in online media

Proceedings of the fourth ACM international conference on Web search and data mining
Unified analysis of streaming news

Proceedings of the 20th international conference on World wide web
Visual memes in social media: tracking real-world news in YouTube videos

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Mining of Massive Datasets

Mining of Massive Datasets
Lydia: a system for large-scale news analysis

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Trains of thought: generating information maps

Proceedings of the 21st international conference on World Wide Web
Your two weeks of fame and your grandmother's

Proceedings of the 21st international conference on World Wide Web
You had me at hello: how phrasing affects memorability

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Finding progression stages in time-evolving event sequences

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

The real-time information on news sites, blogs and social networking sites changes dynamically and spreads rapidly through the Web. Developing methods for handling such information at a massive scale requires that we think about how information content varies over time, how it is transmitted, and how it mutates as it spreads. We describe the News Information Flow Tracking, Yay! (NIFTY) system for large scale real-time tracking of "memes" - short textual phrases that travel and mutate through the Web. NIFTY is based on a novel highly-scalable incremental meme-clustering algorithm that efficiently extracts and identifies mutational variants of a single meme. NIFTY runs orders of magnitude faster than our previous Memetracker system, while also maintaining better consistency and quality of extracted memes. We demonstrate the effectiveness of our approach by processing a 20 terabyte dataset of 6.1 billion blog posts and news articles that we have been continuously collecting for the last four years. NIFTY extracted 2.9 billion unique textual phrases and identified more than 9 million memes. Our meme-tracking algorithm was able to process the entire dataset in less than five days using a single machine. Furthermore, we also provide a live deployment of the NIFTY system that allows users to explore the dynamics of online news in near real-time.