Conceptual Clustering of Heterogeneous GeneExpression Sequences

  • Authors:
  • Sally McClean;Bryan Scotney;Steve Robinson

  • Affiliations:
  • School of Computing and Information Engineering,;School of Computing and Information Engineering, Faculty of Informatics, University of Ulster, Cromore Road, Coleraine, BT52 1SA, Northern Ireland (E-mail: bw.scotney@ulster.ac.uk ...;School of Computing and Information Engineering, Faculty of Informatics, University of Ulster, Cromore Road, Coleraine, BT52 1SA, Northern Ireland (E-mail: s.robinson@ulster.ac.uk ...

  • Venue:
  • Artificial Intelligence Review
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We are concerned with clustering andcharacterising gene expression sequences thathave been classified according to heterogeneousclassification schemes. We adopt a model-basedapproach that uses a Hidden Markov Model (HMM)that has as states the stages of the underlyingprocess that generates the gene sequences, thusallowing us to handle complex and heterogeneousdata. Each cluster is described in terms of aHMM where we seek to find schema mappingsbetween the states of the original sequencesand the states of the HMM.The general solution that we propose involvesseveral distinct tasks. Firstly, there is aclustering problem where we seek to groupsimilar sequences; for this we use mutualentropy to identify associations betweensequence states. Secondly, because we areconcerned with clustering heterogeneoussequences, we must determine the mappingsbetween the states of each sequence in acluster and the states of an underlying hiddenprocess; for this we compute the most probablemapping. Thirdly, using these mappings weemploy maximum likelihood techniques to learnthe probabilistic description of the hiddenMarkov process for each cluster. Fourthly, weuse these descriptions to characterise theclusters using Dynamic Programming to determinethe most probable pathway for each cluster.Finally, we derive linguistic labels todescribe the clusters in a user-friendlymanner. Such an approach provides an intuitiveway of describing the underlying shape of theprocess by explicitly modelling the temporalaspects of the data. Non time-homogeneous HMMsare used to capture the full temporal semantics.