Discrete Sequence Prediction and Its Applications
Machine Learning
Ordered and Unordered Tree Inclusion
SIAM Journal on Computing
Computational geometry: algorithms and applications
Computational geometry: algorithms and applications
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Online association rule mining
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Incremental and interactive sequence mining
Proceedings of the eighth international conference on Information and knowledge management
Synopsis data structures for massive data sets
External memory algorithms
Data Structures and Algorithms
Data Structures and Algorithms
Efficient generation of plane trees
Information Processing Letters
Discovering Structural Association of Semistructured Data
IEEE Transactions on Knowledge and Data Engineering
Optimized Substructure Discovery for Semi-structured Data
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Graph-Based Induction for General Graph Structured Data
DS '99 Proceedings of the Second International Conference on Discovery Science
Efficiently mining frequent trees in a forest
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A unifying framework for detecting outliers and change points from non-stationary time series data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Online Algorithms for Mining Semi-structured Data Stream
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Hi-index | 0.01 |
In this paper, we study an online data mining problem from streams of semi-structured data such as XML data. Modeling semi-structured data and patterns as labeled ordered trees, we present an online algorithm StreamT that receives fragments of an unseen possibly infinite semi-structured data in the document order through a data stream, and can return the current set of frequent patterns immediately on request at any time. We give modifications of the algorithm to other online mining models. Furthermore we implement our algorithms in different online models and candidate management strategies, then show empirical analyses to evaluate the algorithms.