Efficient algorithms for finding frequent substructures from semi-structured data streams

Authors:
Tatsuya Asai;Kenji Abe;Shinji Kawasoe;Hiroki Arimura;Setsuo Arikawa
Affiliations:
Fujitsu Laboratories Ltd. and;Sharp Corp. and Kyushu University, Fukuoka, Japan;NTT Comware Corp. and Kyushu University, Fukuoka, Japan;Hokkaido University and Kyushu University, Fukuoka, Japan;Kyushu University, Fukuoka, Japan
Venue:
JSAI'03/JSAI04 Proceedings of the 2003 and 2004 international conference on New frontiers in artificial intelligence
Year:
2003

Citing 16
Cited 0

Discrete Sequence Prediction and Its Applications

Machine Learning
Ordered and Unordered Tree Inclusion

SIAM Journal on Computing
Computational geometry: algorithms and applications

Computational geometry: algorithms and applications
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Online association rule mining

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Incremental and interactive sequence mining

Proceedings of the eighth international conference on Information and knowledge management
Synopsis data structures for massive data sets

External memory algorithms
Data Structures and Algorithms

Data Structures and Algorithms
Efficient generation of plane trees

Information Processing Letters
Discovering Structural Association of Semistructured Data

IEEE Transactions on Knowledge and Data Engineering
Optimized Substructure Discovery for Semi-structured Data

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Graph-Based Induction for General Graph Structured Data

DS '99 Proceedings of the Second International Conference on Discovery Science
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A unifying framework for detecting outliers and change points from non-stationary time series data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Online Algorithms for Mining Semi-structured Data Stream

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we study an online data mining problem from streams of semi-structured data such as XML data. Modeling semi-structured data and patterns as labeled ordered trees, we present an online algorithm StreamT that receives fragments of an unseen possibly infinite semi-structured data in the document order through a data stream, and can return the current set of frequent patterns immediately on request at any time. We give modifications of the algorithm to other online mining models. Furthermore we implement our algorithms in different online models and candidate management strategies, then show empirical analyses to evaluate the algorithms.