Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents

Authors:
Tetsuhiro Miyahara;Takayoshi Shoudai;Tomoyuki Uchida;Kenichi Takahashi;Hiroaki Ueda
Affiliations:
-;-;-;-;-
Venue:
PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Year:
2001

Citing 6
Cited 9

Extracting schema from semistructured data

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data on the Web: from relations to semistructured data and XML

Data on the Web: from relations to semistructured data and XML
Discovering Structural Association of Semistructured Data

IEEE Transactions on Knowledge and Data Engineering
Optimizing Regular Path Expressions Using Graph Schemas

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
KD-FGS: A Knowledge Discovery System from Graph Data Using Formal Graph System

PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
Polynomial Time Matching Algorithms for Tree-Like Structured Patterns in Knowledge Discovery

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications

Optimized Substructure Discovery for Semi-structured Data

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Extracting Characteristic Structures among Words in Semistructured Documents

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Ordered Term Tree Languages which Are Polynomial Time Inductively Inferable from Positive Data

ALT '02 Proceedings of the 13th International Conference on Algorithmic Learning Theory
Polynomial Time Algorithms for Finding Unordered Tree Patterns with Internal Variables

FCT '01 Proceedings of the 13th International Symposium on Fundamentals of Computation Theory
Polynomial Time Inductive Inference of Ordered Tree Patterns with Internal Structured Variables from Positive Data

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Recommending structure in collaborative semistructured information systems

Proceedings of the fourth ACM conference on Recommender systems
Mining frequent trees based on topology projection

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Sequential pattern mining for structure-based XML document classification

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many documents such as Web documents or XML files have no rigid structure. Such semistructured documents have been rapidly increasing. We propose a new method for discovering frequent tree structured patterns in semistructured Web documents. We consider the data mining problem of finding all maximally frequent tag tree patterns in semistructured data such as Web documents. A tag tree pattern is an edge labeled tree which has hyperedges as variables. An edge label is a tag or a keyword inWeb documents, and a variable can be substituted by any tree. So a tag tree pattern is suited for representing tree structured patterns in semistructured Web documents. We present an algorithm for finding all maximally frequent tag tree patterns. Also we report some experimental results on XML documents by using our algorithm.