Fast mining and forecasting of complex time-stamped events

Authors:
Yasuko Matsubara;Yasushi Sakurai;Christos Faloutsos;Tomoharu Iwata;Masatoshi Yoshikawa
Affiliations:
Kyoto University, Kyoto, Japan;NTT Communication Science Labs, Kyoto, Japan;Carnegie Mellon University, Pittsburgh, PA, USA;NTT Communication Science Labs, Kyoto, Japan;Kyoto University, Kyoto, Japan
Venue:
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2012

Citing 25
Cited 2

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A Multilinear Singular Value Decomposition

SIAM Journal on Matrix Analysis and Applications
Latent dirichlet allocation

The Journal of Machine Learning Research
BRAID: stream mining through group lag correlations

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Higher-Order Web Link Analysis Using Multilinear Algebra

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Modeling skew in data streams

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Optimal multi-scale patterns in time series streams

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Dynamic topic models

ICML '06 Proceedings of the 23rd international conference on Machine learning
Topics over time: a non-Markov continuous-time model of topical trends

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive, hands-off stream mining

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Data spectroscopy: learning mixture models using eigenspaces of convolution operators

Proceedings of the 25th international conference on Machine learning
Estimating local optimums in EM algorithm over Gaussian mixture model

Proceedings of the 25th international conference on Machine learning
Fast collapsed gibbs sampling for latent dirichlet allocation

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Spatio-temporal models for estimating click-through rate

Proceedings of the 18th international conference on World wide web
Learning optimal ranking with tensor factorization for tag recommendation

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient methods for topic model inference on streaming document collections

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamic mixture models for multiple time series

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Topic tracking model for analyzing consumer purchase behavior

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Scalable Algorithms for Distribution Search

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Tagging stream data for rich real-time services

Proceedings of the VLDB Endowment
Online multiscale dynamic topic models

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Parsimonious linear fingerprinting for time series

Proceedings of the VLDB Endowment
Tracking trends: incorporating term volume into temporal topic models

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
The spectral method for general mixture models

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Pattern discovery in data streams under the time warping distance

The VLDB Journal — The International Journal on Very Large Data Bases
Finding progression stages in time-evolving event sequences

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given huge collections of time-evolving events such as web-click logs, which consist of multiple attributes (e.g., URL, userID, times- tamp), how do we find patterns and trends? How do we go about capturing daily patterns and forecasting future events? We need two properties: (a) effectiveness, that is, the patterns should help us understand the data, discover groups, and enable forecasting, and (b) scalability, that is, the method should be linear with the data size. We introduce TriMine, which performs three-way mining for all three attributes, namely, URLs, users, and time. Specifically TriMine discovers hidden topics, groups of URLs, and groups of users, simultaneously. Thanks to its concise but effective summarization, it makes it possible to accomplish the most challenging and important task, namely, to forecast future events. Extensive experiments on real datasets demonstrate that TriMine discovers meaningful topics and makes long-range forecasts, which are notoriously difficult to achieve. In fact, TriMine consistently outperforms the best state-of-the-art existing methods in terms of accuracy and execution speed (up to 74x faster).