Topic Detection and Tracking: Event-Based Information Organization
Topic Detection and Tracking: Event-Based Information Organization
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
TopCat: Data Mining for Topic Identification in a Text Corpus
IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach
IEEE Transactions on Knowledge and Data Engineering
WWW '05 Proceedings of the 14th international conference on World Wide Web
Discovering Significant Patterns in Multi-stream Sequences
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Ranking sequential patterns with respect to significance
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Hi-index | 0.00 |
We present a new method for mining and ranking streams of news stories using cross-stream sequential patterns and content similarity. In particular, we focus on stories reporting the same event across the streams within a given time window, where an event is defined as a specific thing that happens at a specific time and place. For every discovered cluster of stories reporting the same event we create an itemset-sequence consisting of stream identifiers of the stories in the cluster, where the sequence is ordered according to the timestamps of the stories. Furthermore, we record exact timestamps and content similarities between the respective stories. Given such a collection of itemset-sequences we use it for two tasks: (I) to discover recurrent temporal publishing patterns between the news streams in terms of frequent sequential patterns and content similarity and (II) to rank the streams of news stories with respect to timeliness of reporting important events and content authority. We demonstrate the applicability of the presented method on a multi-stream of news stories was gathered from RSS feeds of major world news agencies.