A study of retrospective and on-line event detection
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
On-line new event detection and tracking
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Extracting significant time varying features from text
Proceedings of the eighth international conference on Information and knowledge management
Automatic generation of overview timelines
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Improving text categorization methods for event tracking
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating a probabilistic model for cross-lingual information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Bursty and hierarchical structure in streams
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On the bursty evolution of blogspace
WWW '03 Proceedings of the 12th international conference on World Wide Web
The Journal of Machine Learning Research
Identifying similarities, periodicities and bursts for online search queries
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
On demand classification of data streams
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A cross-collection mixture model for comparative text mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Semantic similarity between search engine queries using temporal correlation
WWW '05 Proceedings of the 14th international conference on World Wide Web
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Discovering evolutionary theme patterns from text: an exploration of temporal text mining
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining comparable bilingual text corpora for cross-language information integration
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Parameter free bursty events detection in text streams
VLDB '05 Proceedings of the 31st international conference on Very large data bases
A probabilistic approach to spatiotemporal theme pattern mining on weblogs
Proceedings of the 15th international conference on World Wide Web
A mixture model for contextual text mining
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Streams: Models and Algorithms (Advances in Database Systems)
Data Streams: Models and Algorithms (Advances in Database Systems)
Named entity transliteration with comparable corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Efficient computation of personal aggregate queries on blogs
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable and near real-time burst detection from eCommerce queries
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining common topics from multiple asynchronous text streams
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Meme-tracking and the dynamics of the news cycle
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Information discovery across multiple streams
Information Sciences: an International Journal
Trends Analysis of Topics Based on Temporal Segmentation
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Optimal distance bounds for fast search on compressed time-series query logs
ACM Transactions on the Web (TWEB)
The flow of on-line information in global networks
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
PET: a statistical model for popular events tracking in social communities
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Cross-lingual latent topic extraction
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
From bursty patterns to bursty facts: The effectiveness of temporal text mining for news
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Patterns of temporal variation in online media
Proceedings of the fourth ACM international conference on Web search and data mining
Dynamic relationship and event discovery
Proceedings of the fourth ACM international conference on Web search and data mining
Mining named entities with temporally correlated bursts from multilingual web news streams
Proceedings of the fourth ACM international conference on Web search and data mining
A word at a time: computing word relatedness using temporal semantic analysis
Proceedings of the 20th international conference on World wide web
Extracting hot spots of topics from time-stamped documents
Data & Knowledge Engineering
Visual content correlation analysis
Proceedings of the first international workshop on Intelligent visual interfaces for text analysis
Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Tracking trends: incorporating term volume into temporal topic models
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A time-dependent topic model for multiple text streams
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding critical thresholds for defining bursts
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
A time-varying propagation model of hot topic on BBS sites and Blog networks
Information Sciences: an International Journal
Extracting multilingual topics from unaligned comparable corpora
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Indices of novelty for emerging topic detection
Information Processing and Management: an International Journal
Bursty event detection from collaborative tags
World Wide Web
ACM Transactions on Management Information Systems (TMIS)
Finding bursty topics from microblogs
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Identifying event-related bursts via social media activities
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Blog topic analysis using TF smoothing and LDA
Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Research on Mining Common Concern via Infinite Topic Modelling
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Mining evolutionary multi-branch trees from text streams
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Behavioral dynamics on the web: Learning, modeling, and prediction
ACM Transactions on Information Systems (TOIS)
AnchorMF: towards effective event context identification
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Lead-lag analysis via sparse co-projection in correlated text streams
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Using temporal bursts for query modeling
Information Retrieval
Story graphs: Tracking document set evolution using dynamic graphs
Intelligent Data Analysis - Dynamic Networks and Knowledge Discovery
Hi-index | 0.00 |
Previous work on text mining has almost exclusively focused on a single stream. However, we often have available multiple text streams indexed by the same set of time points (called coordinated text streams), which offer new opportunities for text mining. For example, when a major event happens, all the news articles published by different agencies in different languages tend to cover the same event for a certain period, exhibiting a correlated bursty topic pattern in all the news article streams. In general, mining correlated bursty topic patterns from coordinated text streams can reveal interesting latent associations or events behind these streams. In this paper, we define and study this novel text mining problem. We propose a general probabilistic algorithm which can effectively discover correlated bursty patterns and their bursty periods across text streams even if the streams have completely different vocabularies (e.g., English vs Chinese). Evaluation of the proposed method on a news data set and a literature data set shows that it can effectively discover quite meaningful topic patterns from both data sets: the patterns discovered from the news data set accurately reveal the major common events covered in the two streams of news articles (in English and Chinese, respectively), while the patterns discovered from two database publication streams match well with the major research paradigm shifts in database research. Since the proposed method is general and does not require the streams to share vocabulary, it can be applied to any coordinated text streams to discover correlated topic patterns that burst in multiple streams in the same period.