Bursty and hierarchical structure in streams
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Investigating the relationship between language model perplexity and IR precision-recall measures
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Kruskal's condition for uniqueness in Candecomp/Parafac when ranks and k-ranks coincide
Computational Statistics & Data Analysis
Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation
Topic models vs. unstructured data
Communications of the ACM
Event detection with spatial latent Dirichlet allocation
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
A PAM-based ontology concept and hierarchy learning method
Journal of Information Science
Hi-index | 0.00 |
In the information retrieval field, effective and efficient extraction of topics from large-scale online text streams is challenging because it is a fully unsupervised learning task without prior knowledge. Most previous studies have focused on how to analyse text corpus to extract topics, rarely considering time dimensions. In the present study, we approached topic detection as a temporal optimization problem. Here, we propose a novel approach to incremental topic detection, called online topic detection using tensor factorization (OTD-TF), which is based on latent Dirichlet allocation (LDA). First, topics are obtained from the corpus in current time slices using LDA. Second, a topic tensor with a time dimension is constructed to identify the correlations between pairs of topics. Then, approximate topics are merged using TF. Finally, documents are reallocated to corresponding topic bins. By executing these steps continuously and incrementally, temporal topic detection can be achieved. In theoretical analyses and simulation experiments, OTD-TF outperformed other systems in terms of space and time complexity and achieved a high precision ratio. Our experimental evaluations also revealed interesting temporal patterns in topic emergence, development, extinction, burst and transience.