A new efficient probabilistic model for mining labeled ordered trees applied to glycobiology

Authors:
Kosuke Hashimoto;Kiyoko Flora Aoki-Kinoshita;Nobuhisa Ueda;Minoru Kanehisa;Hiroshi Mamitsuka
Affiliations:
Institute for Chemical Research, Kyoto University, Japan;Institute for Chemical Research, Kyoto University, Japan;Institute for Chemical Research, Kyoto University, Japan;Institute for Chemical Research, Kyoto University, Japan;Institute for Chemical Research, Kyoto University, Japan
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2008

Citing 14
Cited 2

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Fundamentals of speech recognition

Fundamentals of speech recognition
Predicting Protein Secondary Structure Using Stochastic Tree Grammars

Machine Learning - Special issue on learning with probabilistic representations
Data on the Web: from relations to semistructured data and XML

Data on the Web: from relations to semistructured data and XML
The Hierarchical Hidden Markov Model: Analysis and Applications

Machine Learning
A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

Machine Learning
Kernels for Semi-Structured Data

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Hidden Tree Markov Models for Document Image Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Managing and analyzing carbohydrate data

ACM SIGMOD Record
Text Mining: Predictive Methods for Analyzing Unstructured Information

Text Mining: Predictive Methods for Analyzing Unstructured Information
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
A Probabilistic Model for Mining Labeled Ordered Trees: Capturing Patterns in Carbohydrate Sugar Chains

IEEE Transactions on Knowledge and Data Engineering
A new efficient probabilistic model for mining labeled ordered trees

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams

Proceedings of the 2010 conference on Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams
Mining frequent closed trees in evolving data streams

Intelligent Data Analysis - Ubiquitous Knowledge Discovery

Quantified Score

Hi-index	0.01

Visualization

Abstract

Mining frequent patterns from large datasets is an important issue in data mining. Recently, complex and unstructured (or semi-structured) datasets have appeared as targets for major data mining applications, including text mining, web mining and bioinformatics. Our work focuses on labeled ordered trees, which are typically semi-structured datasets. In bioinformatics, carbohydrate sugar chains, or glycans, can be modeled as labeled ordered trees. Glycans are the third major class of biomolecules, having important roles in signaling and recognition. For mining labeled ordered trees, we propose a new probabilistic model and its efficient learning scheme which significantly improves the time and space complexity of an existing probabilistic model for labeled ordered trees. We evaluated the performance of the proposed model, comparing it with those of other probabilistic models, using synthetic as well as real datasets from glycobiology. Experimental results showed that the proposed model drastically reduced the computation time of the competing model, keeping the predictive power and avoiding overfitting to the training data. Finally, we assessed our results on real data from a variety of biological viewpoints, verifying known facts in glycobiology.