Mining named entities with temporally correlated bursts from multilingual web news streams

Authors:
Alexander Kotov;ChengXiang Zhai;Richard Sproat
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the fourth ACM international conference on Web search and data mining
Year:
2011

Citing 16
Cited 5

The Markov-modulated Poisson process (MMPP) cookbook

Performance Evaluation
Automatic generation of overview timelines

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Bursty and hierarchical structure in streams

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient elastic burst detection in data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine transliteration

Computational Linguistics
Semantic similarity between search engine queries using temporal correlation

WWW '05 Proceedings of the 14th international conference on World Wide Web
Mining comparable bilingual text corpora for cross-language information integration

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Machine transliteration of names in Arabic text

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Data association for topic intensity tracking

ICML '06 Proceedings of the 23rd international conference on Machine learning
Named entity discovery using comparable news articles

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Mining correlated bursty topic patterns from coordinated text streams

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A phonetic similarity model for automatic extraction of transliteration pairs

ACM Transactions on Asian Language Information Processing (TALIP)
Boolean representation based data-adaptive correlation analysis over time series streams

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Scalable and near real-time burst detection from eCommerce queries

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining common topics from multiple asynchronous text streams

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Hidden Markov processes

IEEE Transactions on Information Theory

Identifying event-related bursts via social media activities

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A significance-driven framework for characterizing and finding evolving patterns of news networks

AICI'12 Proceedings of the 4th international conference on Artificial Intelligence and Computational Intelligence
Bursty subgraphs in social networks

Proceedings of the sixth ACM international conference on Web search and data mining
Emerging topic detection for organizations from microblogs

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Chelsea won, and you bought a t-shirt: characterizing the interplay between Twitter and e-commerce

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we study a new text mining problem of discovering named entities with temporally correlated bursts of mention counts in multiple multilingual Web news streams. Mining named entities with temporally correlated bursts of mention counts in multilingual text streams has many interesting and important applications, such as identification of the latent events, attracting the attention of on-line media in different countries, and valuable linguistic knowledge in the form of transliterations. While mining "bursty" terms in a single text stream has been studied before, the problem of detecting terms with temporally correlated bursts in multilingual Web streams raises two new challenges: (i) correlated terms in multiple streams may have bursts that are of different orders of magnitude in their intensity and (ii) bursts of correlated terms may be separated by time gaps. We propose a two-stage method for mining items with temporally correlated bursts from multiple data streams, which addresses both challenges. In the first stage of the method, the temporal behavior of different entities is normalized by modeling them with the Markov-Modulated Poisson Process. In the second stage, a dynamic programming algorithm is used to discover correlated bursts of different items, that can be potentially separated by time gaps. We evaluated our method with the task of discovering transliterations of named entities from multilingual Web news streams. Experimental results indicate that our method can not only effectively discover named entities with correlated bursts in multilingual Web news streams, but also outperforms two state-of-the-art baseline methods for unsupervised discovery of transliterations in static text collections.