The derivation problem of summary data
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
A universal-scheme approach to statistical databases containing homogeneous summary tables
ACM Transactions on Database Systems (TODS)
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Probabilistic independence networks for hidden Markov probability models
Neural Computation
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
IPCAT '97 Proceedings of the second international workshop on Information processing in cell and tissues
Optimal and efficient integration of heterogeneous summary tables in a distributed database
Data & Knowledge Engineering
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Deformable Markov model templates for time-series pattern matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A general probabilistic framework for clustering individuals and objects
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning
An Evidential Reasoning Approach to Attribute Value Conflict Resolution in Database Integration
IEEE Transactions on Knowledge and Data Engineering
Aggregation of Imprecise and Uncertain Information in Databases
IEEE Transactions on Knowledge and Data Engineering
Interpreting microarray expression data using text annotating the genes
Information Sciences—Applications: An International Journal
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Model-Based Clustering and Visualization of Navigation Patterns on a Web Site
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
We are concerned with clustering andcharacterising gene expression sequences thathave been classified according to heterogeneousclassification schemes. We adopt a model-basedapproach that uses a Hidden Markov Model (HMM)that has as states the stages of the underlyingprocess that generates the gene sequences, thusallowing us to handle complex and heterogeneousdata. Each cluster is described in terms of aHMM where we seek to find schema mappingsbetween the states of the original sequencesand the states of the HMM.The general solution that we propose involvesseveral distinct tasks. Firstly, there is aclustering problem where we seek to groupsimilar sequences; for this we use mutualentropy to identify associations betweensequence states. Secondly, because we areconcerned with clustering heterogeneoussequences, we must determine the mappingsbetween the states of each sequence in acluster and the states of an underlying hiddenprocess; for this we compute the most probablemapping. Thirdly, using these mappings weemploy maximum likelihood techniques to learnthe probabilistic description of the hiddenMarkov process for each cluster. Fourthly, weuse these descriptions to characterise theclusters using Dynamic Programming to determinethe most probable pathway for each cluster.Finally, we derive linguistic labels todescribe the clusters in a user-friendlymanner. Such an approach provides an intuitiveway of describing the underlying shape of theprocess by explicitly modelling the temporalaspects of the data. Non time-homogeneous HMMsare used to capture the full temporal semantics.