Managing and analyzing carbohydrate data

Authors:
Kiyoko F. Aoki;Nobuhisa Ueda;Atsuko Yamaguchi;Tatsuya Akutsu;Minoru Kanehisa;Hiroshi Mamitsuka
Affiliations:
Kyoto University, Kyoto, Japan;Kyoto University, Kyoto, Japan;Kyoto University, Kyoto, Japan;Kyoto University, Kyoto, Japan;Kyoto University, Kyoto, Japan;Kyoto University, Kyoto, Japan
Venue:
ACM SIGMOD Record
Year:
2004

Citing 8
Cited 4

Fundamentals of speech recognition

Fundamentals of speech recognition
The Tree-to-Tree Correction Problem

Journal of the ACM (JACM)
The Hierarchical Hidden Markov Model: Analysis and Applications

Machine Learning
Edit distance between two RNA structures

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Kernels for Semi-Structured Data

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Hidden Tree Markov Models for Document Image Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Online Algorithms for Mining Semi-structured Data Stream

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

A Probabilistic Model for Mining Labeled Ordered Trees: Capturing Patterns in Carbohydrate Sugar Chains

IEEE Transactions on Knowledge and Data Engineering
A new efficient probabilistic model for mining labeled ordered trees

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A new efficient probabilistic model for mining labeled ordered trees applied to glycobiology

ACM Transactions on Knowledge Discovery from Data (TKDD)
An efficient unordered tree kernel and its application to glycan classification

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the most vital molecules in multicellular organisms is the carbohydrate, as it is structurally important in the construction of such organisms. In fact, all cells in nature carry carbohydrate sugar chains, or glycans, that help modulate various cell-cell events for the development of the organism. Unfortunately, informatics research on glycans has been slow in comparison to DNA and proteins, largely due to difficulties in the biological analysis of glycan structures. Our work consists of data engineering approaches in order to glean some understanding of the current glycan data that is publicly available. In particular, by modeling glycans as labeled unordered trees, we have implemented a tree-matching algorithm for measuring tree similarity. Our algorithm utilizes proven efficient methodologies in computer science that has been extended and developed for glycan data. Moreover, since glycans are recognized by various agents in multicellular organisms, in order to capture the patterns that might be recognized, we needed to somehow capture the dependencies that seem to range beyond the directly connected nodes in a tree. Therefore, by defining glycans as labeled ordered trees, we were able to develop a new probabilistic tree model such that sibling patterns across a tree could be mined. We provide promising results from our methodologies that could prove useful for the future of glycome informatics.