A Probabilistic Model for Mining Labeled Ordered Trees: Capturing Patterns in Carbohydrate Sugar Chains

Authors:
Nobuhisa Ueda;Kiyoko F. Aoki-Kinoshita;Atsuko Yamaguchi;Tatsuya Akutsu;Hiroshi Mamitsuka
Affiliations:
-;IEEE;-;-;IEEE
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2005

Citing 20
Cited 5

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Fundamentals of speech recognition

Fundamentals of speech recognition
Predicting Protein Secondary Structure Using Stochastic Tree Grammars

Machine Learning - Special issue on learning with probabilistic representations
Data on the Web: from relations to semistructured data and XML

Data on the Web: from relations to semistructured data and XML
The Hierarchical Hidden Markov Model: Analysis and Applications

Machine Learning
Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

Machine Learning
Kernels for Semi-Structured Data

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A graphical environment for change detection in structured documents

COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
Simplified Training Algorithms for Hierarchical Hidden Markov Models

DS '01 Proceedings of the 4th International Conference on Discovery Science
Hidden Tree Markov Models for Document Image Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Online Algorithms for Mining Semi-structured Data Stream

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Model-Based Clustering and Visualization of Navigation Patterns on a Web Site

Data Mining and Knowledge Discovery
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Cyclic pattern kernels for predictive graph mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Extensions of marginalized graph kernels

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Managing and analyzing carbohydrate data

ACM SIGMOD Record
Application of a new probabilistic model for recognizing complex patterns in glycans

Bioinformatics
Wavelet-based statistical signal processing using hidden Markovmodels

IEEE Transactions on Signal Processing

A new efficient probabilistic model for mining labeled ordered trees

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A new efficient probabilistic model for mining labeled ordered trees applied to glycobiology

ACM Transactions on Knowledge Discovery from Data (TKDD)
A 6-approximation algorithm for computing smallest common aon-supertree with application to the reconstruction of glycan trees

ISAAC'06 Proceedings of the 17th international conference on Algorithms and Computation
Multiple tree alignment with weights applied to carbohydrates to extract binding recognition patterns

PRIB'12 Proceedings of the 7th IAPR international conference on Pattern Recognition in Bioinformatics
Coloring based approach for matching unrooted and/or unordered trees

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Glycans, or carbohydrate sugar chains, which play a number of important roles in the development and functioning of multicellular organisms, can be regarded as labeled ordered trees. A recent increase in the documentation of glycan structures, especially in the form of database curation, has made mining glycans important for the understanding of living cells. We propose a probabilistic model for mining labeled ordered trees, and we further present an efficient learning algorithm for this model, based on an EM algorithm. The time and space complexities of this algorithm are rather favorable, falling within the practical limits set by a variety of existing probabilistic models, including stochastic context-free grammars. Experimental results have shown that, in a supervised problem setting, the proposed method outperformed five other competing methods by a statistically significant factor in all cases. We further applied the proposed method to aligning multiple glycan trees, and we detected biologically significant common subtrees in these alignments where the trees are automatically classified into subtypes already known in glycobiology. Extended abstracts of parts of the work presented in this paper have appeared in [35], [4], and [3].