On-line new event detection and tracking
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The Journal of Machine Learning Research
Graphical models for visual object recognition and tracking
Graphical models for visual object recognition and tracking
Mining correlated bursty topic patterns from coordinated text streams
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining common topics from multiple asynchronous text streams
Proceedings of the Second ACM International Conference on Web Search and Data Mining
A time-dependent topic model for multiple text streams
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
This paper focuses on mining common concern among different textual data sources and analyzing their own eigen topics via infinite topic modelling. By incorporating non-parametric Bayesian approaches, our work achieves a good performance and better accords with the reality by avoiding restrictive assumptions. We proposed extended processes of Dirichlet process(DP) -- bidirectional stick-breaking process and multi-branches process--based on strick-breaking construction to model multiple sequences of probability measures in one process rather than simply combine several DPs. On the basis of this new perspective of DP, we discover the common topics and eigen topics via infinite topic modelling in a simple way without setting topic number. The experiments are carried out on three corpora of BBC news, about the UK, the US and China forum respectively. The results present the common concern of these three districts and their eigen interests in other aspects.