Efficient monitoring of personalized hot news over Web 2.0 streams

Authors:
Parisa Haghani;Sebastian Michel;Karl Aberer
Affiliations:
EPFL IC ISC LSIR, Lausanne, Switzerland 1015;Saarland University, Saarbrücken, Germany 66123;EPFL IC ISC LSIR, Lausanne, Switzerland 1015
Venue:
Computer Science - Research and Development
Year:
2012

Citing 25
Cited 2

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
Index structures for selective dissemination of information under the Boolean model

ACM Transactions on Database Systems (TODS)
On-line new event detection and tracking

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
PREFER: a system for the efficient execution of multi-parametric ranked queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Combining fuzzy information: an overview

ACM SIGMOD Record
The Skyline Operator

Proceedings of the 17th International Conference on Data Engineering
What's hot and what's not: tracking most frequent items dynamically

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding frequent items in data streams

Theoretical Computer Science - Special issue on automata, languages and programming
On the Bursty Evolution of Blogspace

World Wide Web
Continuous monitoring of top-k queries over sliding windows

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Answering top-k queries using views

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Analyzing feature trajectories for event detection

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Ad-hoc top-k query answering for data streams

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Algorithms and Data Structures: The Basic Toolbox

Algorithms and Data Structures: The Basic Toolbox
Mining Frequent Itemsets in a Stream

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Sliding-window top-k queries on uncertain streams

Proceedings of the VLDB Endowment
Top-k aggregation using intersections of ranked inputs

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Efficient identification of starters and followers in social media

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
An Incremental Threshold Method for Continuous Text Search Queries

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
TwitterMonitor: trend detection over the twitter stream

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
The gist of everything new: personalized top-k processing over web 2.0 streams

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
EnBlogue: emergent topic detection in web 2.0 streams

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Trend detection in folksonomies

SAMT'06 Proceedings of the First international conference on Semantic and Digital Media Technologies

Exploiting temporal topic models in social media retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Processing continuous text queries featuring non-homogeneous scoring functions

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web 2.0 streams, like blog postings, micro-blogging tweets, or RSS feeds from online communities, offer a wealth of latest news about real-world events and societal discussion. From a user's perspective, it becomes harder and harder to get a decent overview of recent events, given these massive streams of information that are continuously flowing. Ideally, a system would continuously put together recent information, ranked by the current social impact but also weighted by the users' personal interests. In this work, we develop methods to meet these requirements. The presented approach continuously tracks the most popular tags attached to the incoming items and based on this, constructs a dynamic top-k query. By continuous evaluation of this query on the incoming stream, we are able to retrieve the currently hottest items. These hottest items are then fed into an engine that re-ranks them w.r.t. user specified interests, given in form of term based topic descriptions. This calls for high performance algorithms for efficient hot document retrieval and subsequently personalizing these documents based on user profiles, given the high rate of incoming data and the immense number of user profiles. In this work we present a combined solution, making use of our prior work on information filtering and showing how it can be used in combination with the current work, on how to continuously determine the hottest documents. To demonstrate the suitability of our approach, we perform a performance evaluation using a real-world dataset obtained from a weblog crawl.