A new efficient probabilistic model for mining labeled ordered trees

Authors:
Kosuke Hashimoto;Kiyoko F. Aoki-Kinoshita;Nobuhisa Ueda;Minoru Kanehisa;Hiroshi Mamitsuka
Affiliations:
Kyoto University, Gokasho Uji, Japan;Kyoto University, Gokasho Uji, Japan;Kyoto University, Gokasho Uji, Japan;Kyoto University, Gokasho Uji, Japan;Kyoto University, Gokasho Uji, Japan
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 11
Cited 3

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Data on the Web: from relations to semistructured data and XML

Data on the Web: from relations to semistructured data and XML
A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

Machine Learning
Kernels for Semi-Structured Data

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Hidden Tree Markov Models for Document Image Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Managing and analyzing carbohydrate data

ACM SIGMOD Record
Text Mining: Predictive Methods for Analyzing Unstructured Information

Text Mining: Predictive Methods for Analyzing Unstructured Information
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
A Probabilistic Model for Mining Labeled Ordered Trees: Capturing Patterns in Carbohydrate Sugar Chains

IEEE Transactions on Knowledge and Data Engineering

Out-of-core coherent closed quasi-clique mining from large dense graph databases

ACM Transactions on Database Systems (TODS)
A new efficient probabilistic model for mining labeled ordered trees applied to glycobiology

ACM Transactions on Knowledge Discovery from Data (TKDD)
Multiple tree alignment with weights applied to carbohydrates to extract binding recognition patterns

PRIB'12 Proceedings of the 7th IAPR international conference on Pattern Recognition in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining frequent patterns is a general and important issue in data mining. Complex and unstructured (or semi-structured) datasets have appeared in major data mining applications, including text mining, web mining and bioinformatics. Mining patterns from these datasets is the focus of many of the current data mining approaches. We focus on labeled ordered trees, typical datasets of semi-structured data in data mining, and propose a new probabilistic model and its efficient learning scheme for mining labeled ordered trees. The proposed approach significantly improves the time and space complexity of an existing probabilistic modeling for labeled ordered trees, while maintaining its expressive power. We evaluated the performance of the proposed model, comparing it with that of the existing model, using synthetic as well as real datasets from the field of glycobiology. Experimental results showed that the proposed model drastically reduced the computation time of the competing model, keeping the predictive power and avoiding overfitting to the training data. Finally, we assessed our results using the proposed model on real data from a variety of biological viewpoints, verifying known facts in glycobiology.