Discovering significant relaxed order-preserving submatrices

Authors:
Qiong Fang;Wilfred Ng;Jianlin Feng
Affiliations:
Hong Kong University of Science and Technology, Hong Kong, Hong Kong;Hong Kong University of Science and Technology, Hong Kong, Hong Kong;Sun Yat-Sen University, Guangzhou, China
Venue:
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2010

Citing 10
Cited 1

Discovering local structure in gene expression data: the order-preserving submatrix problem

Proceedings of the sixth annual international conference on Computational biology
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
A Survey of Longest Common Subsequence Algorithms

SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Efficient similarity search and classification via rank aggregation

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
OP-Cluster: Clustering by Tendency in High Dimensional Space

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Discovering significant OPSM subspace clusters in massive gene expression data

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A systematic comparison and evaluation of biclustering methods for gene expression data

Bioinformatics
Discovering bucket orders from full rankings

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Mining Approximate Order Preserving Clusters in the Presence of Noise

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

Mining order-preserving submatrices from probabilistic matrices

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining order-preserving submatrix (OPSM) patterns has received much attention from researchers, since in many scientific applications, such as those involving gene expression data, it is natural to express the data in a matrix and also important to find the order-preserving submatrix patterns. However, most current work assumes the noise-free OPSM model and thus is not practical in many real situations when sample contamination exists. In this paper, we propose a relaxed OPSM model called ROPSM. The ROPSM model supports mining more reasonable noise-corrupted OPSM patterns than another well-known model called AOPC (approximate order-preserving cluster). While OPSM mining is known to be an NP-hard problem, mining ROPSM patterns is even a harder problem. We propose a novel method called ROPSM-Growth to mine ROPSM patterns. Specifically, two pattern growing strategies, such as column-centric strategy and row-centric strategy, are presented, which are effective to grow the seed OPSMs into significant ROPSMs. An effective median-rank based method is also developed to discover the underlying true order of conditions involved in an ROPSM pattern. Our experiments on a biological dataset show that the ROPSM model better captures the characteristics of noise in gene expression data matrix compared to the AOPC model. Importantly, we find that our approach is able to detect more quality biologically significant patterns with comparable efficiency with the counterparts of AOPC. Specifically, at least 26.6% (75 out of 282) of the patterns mined by our approach are strongly associated with more than 10 gene categories (high biological significance), which is 3 times better than that obtained from using the AOPC approach.