Optimized Substructure Discovery for Semi-structured Data

Authors:
Kenji Abe;Shinji Kawasoe;Tatsuya Asai;Hiroki Arimura;Setsuo Arikawa
Affiliations:
-;-;-;-;-
Venue:
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Year:
2002

Citing 18
Cited 36

C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Efficient agnostic PAC-learning with simple hypothesis

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Toward Efficient Agnostic Learning

Machine Learning - Special issue on computational learning theory, COLT'92
Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Transversing itemset lattices with statistical metric pruning

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximation algorithms

Approximation algorithms
Data Structures and Algorithms

Data Structures and Algorithms
Discovering Structural Association of Semistructured Data

IEEE Transactions on Knowledge and Data Engineering
Mining Optimized Association Rules with Categorical and Numeric Attributes

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovering Unordered and Ordered Phrase Association Patterns for Text Mining

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
On Classification and Regression

DS '98 Proceedings of the First International Conference on Discovery Science
Graph-Based Induction for General Graph Structured Data

DS '99 Proceedings of the Second International Conference on Discovery Science
On the Difficulty of Approximately Maximizing Agreements

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory

Frequent free tree discovery in graph data

Proceedings of the 2004 ACM symposium on Applied computing
SPIN: mining maximal frequent subgraphs from graph databases

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An improved extraction pattern representation model for automatic IE pattern acquisition

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Key semantics extraction by dependency tree mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
TRIPS and TIDES: new algorithms for tree mining

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Discriminative pattern mining in software fault detection

Proceedings of the 3rd international workshop on Software quality assurance
Boosting-based parse reranking with subtree features

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Speeding up training with tree kernels for node relation labeling

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
FAT-miner: mining frequent attribute trees

Proceedings of the 2007 ACM symposium on Applied computing
Automatic creation of domain templates

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
On-demand information extraction

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Discovering frequent geometric subgraphs

Information Systems
Tree model guided candidate generation for mining frequent subtrees from XML documents

ACM Transactions on Knowledge Discovery from Data (TKDD)
Accelerating genetic programming by frequent subtree mining

Proceedings of the 10th annual conference on Genetic and evolutionary computation
Fast logistic regression for text categorization with variable-length n-grams

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding Frequent Patterns from Compressed Tree-Structured Data

DS '08 Proceedings of the 11th International Conference on Discovery Science
System demonstration of on-demand information extraction

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Multi-Relational Data Mining

Proceedings of the 2005 conference on Multi-Relational Data Mining
A task-based comparison of information extraction pattern models

DeepLP '07 Proceedings of the Workshop on Deep Linguistic Processing
Semi-structure mining method for text mining with a chunk-based dependency structure

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Efficient algorithms for mining frequent and closed patterns from semi-structured data

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
POTMiner: mining ordered, unordered, and partially-ordered trees

Knowledge and Information Systems
Efficient algorithms for finding frequent substructures from semi-structured data streams

JSAI'03/JSAI04 Proceedings of the 2003 and 2004 international conference on New frontiers in artificial intelligence
Incremental mining of closed frequent subtrees

DS'10 Proceedings of the 13th international conference on Discovery science
Knowledge exploratory project for nanodevice design and manufacturing

Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services
LGM: mining frequent subgraphs from linear graphs

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
A new sequential mining approach to XML document clustering*

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Clustering and retrieval of XML documents by structure

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
EXiT-B: a new approach for extracting maximal frequent subtrees from XML data

IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
Extraction of interesting financial information from heterogeneous XML-Based data

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
Sentiment classification using word sub-sequences and dependency sub-trees

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A structure preserving flat data format representation for tree-structured data

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
A Dichotomic Search Algorithm for Mining and Learning in Domain-Specific Logics

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
FPI: a novel indexing method using frequent patterns for approximate string searches

Proceedings of the Joint EDBT/ICDT 2013 Workshops
Analysis of Textual Data Based on Inductive Learning Techniques

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we consider the problem of discovering interesting substructures from a large collection of semi-structured data in the framework of optimized pattern discovery. We model semi-structured data and patterns with labeled ordered trees, and present an efficient algorithm that discovers the best labeled ordered trees that optimize a given statistical measure, such as the information entropy and the classification accuracy, in a collection of semi-structured data. We give theoretical analyses of the computational complexity of the algorithm for patterns with bounded and unbounded size. Experiments show that the algorithm performs well and discovered interesting patterns on real datasets.