The cluster hypothesis revisited
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Topic Detection and Tracking: Event-Based Information Organization
Topic Detection and Tracking: Event-Based Information Organization
Learning Approaches for Detecting and Tracking News Events
IEEE Intelligent Systems
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
TopCat: Data Mining for Topic Identification in a Text Corpus
IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach
IEEE Transactions on Knowledge and Data Engineering
WWW '05 Proceedings of the 14th international conference on World Wide Web
Discovering Significant Patterns in Multi-stream Sequences
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Dynamic pattern mining: an incremental data clustering approach
Journal on Data Semantics II
A flexible news filtering model exploiting a hierarchical fuzzy categorization
FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
Hi-index | 0.00 |
We present a new method for mining streams of news stories using cross-stream sequential patterns. We cluster stories reporting the same event across the streams within a given time window. For every discovered cluster of stories we create an itemset-sequence consisting of stream identifiers of the stories in the cluster, where the sequence is ordered according to the timestamps of the stories. For every such itemset-sequence we record exact timestamps and content similarities between the respective stories, thus building a collection of itemset-sequences that we use for two tasks: (I) to discover cross-stream dependencies in terms of frequent sequential publishing patterns and content similarity and (II) to rank the streams of news stories with respect to timeliness of reporting important events and content authority. We tested the applicability of the presented method on a collection of streams of news stories which was gathered from major world news agencies.