OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques
Data mining: concepts and techniques
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Automatic extraction of clusters from hierarchical clustering representations
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
User Modeling and User-Adapted Interaction
Cancer classification from serial analysis of gene expression with event models
Applied Intelligence
Medical Imaging and Informatics
Brief communication: A Poisson-based adaptive affinity propagation clustering for SAGE data
Computational Biology and Chemistry
Mining outliers in spatial networks
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
Kernel independent component analysis for gene expression data clustering
ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
Event models for tumor classification with SAGE gene expression data
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Hi-index | 0.00 |
Serial Analysis of Gene Expression (SAGE) has proven to be an important alternative to microarray techniques for global profiling of mRNA populations. We have developed preprocessing methodologies to address problems in analyzing SAGE data due to noise caused by sequencing error, normalization methodologies to account for libraries sampled at different depths, and missing tag imputation methodologies to aid in the analysis of poorly sampled SAGE libraries. We have also used subspace selection using the Wilcoxon rank sum test to exclude tags that have similar expression levels regardless of source. Using these methodologies we have clustered, using the OPTICS algorithm, 88 SAGE libraries derived from cancerous and normal tissues as well as cell line material. Our results produced eight dense clusters representing ovarian cancer cell line, brain cancer cell line, brain cancer bulk tissue, prostate tissue, pancreatic cancer, breast cancer cell line, normal brain, and normal breast bulk tissue. The ovarian cancer and brain cancer cell lines clustered closely together, leading to a further investigation on possible associations between these two cancer types. We also investigated the utility of gene expression data in the classification between normal and cancerous tissues. Our results indicate that brain and breast cancer libraries have strong identities allowing robust discrimination from their normal counterparts. However, the SAGE expression data provide poor predictive accuracy in discriminating between prostate and ovarian cancers and their respective normal tissues.