Visualizing the protein sequence universe

Authors:
Larissa Stanberry;Roger Higdon;Winston Haynes;Natali Kolker;William Broomall;Saliya Ekanayake;Adam Hughes;Yang Ruan;Judy Qiu;Eugene Kolker;Geoffrey Fox
Affiliations:
Seattle Children's Research Institute, Seattle, USA;Seattle Children's Research Institute, Seattle, USA;Seattle Children's Research Institute, Seattle, USA;Seattle Children's Research Institute, Seattle, USA;Seattle Children's Research Institute, Seattle, USA;Indiana University, Boomington, USA;Indiana University, Bloomington, USA;Indiana University, Bloomington, USA;Indiana University, Bloomington, USA;Seattle Children's Research Institute, Seattle, USA;Indiana University, Bloomington, USA
Venue:
Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
Year:
2012

Citing 12
Cited 0

Independent component analysis: algorithms and applications

Neural Networks
Sequence - Evolution - Function: Computational Approaches in Comparative Genomics

Sequence - Evolution - Function: Computational Approaches in Comparative Genomics
A Nonlinear Mapping for Data Structure Analysis

IEEE Transactions on Computers
UniRef

Bioinformatics
Manual curation is not sufficient for annotation of genomic databases

Bioinformatics
Biomedical Case Studies in Data Intensive Computing

CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
High Performance Dimension Reduction and Visualization for Large High-Dimensional Data Analysis

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

The 19th International Symposium on High Performance Distributed Computing
Dimension reduction and visualization of large high-dimensional data via interpolation

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Modeling sequence and function similarity between proteins for protein functional annotation

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Applying Twister to Scientific Applications

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern biology is experiencing a rapid increase in data volumes that challenges our analytical skills and existing cyberinfrastructure. Exponential expansion of the Protein Sequence Universe (PSU), the protein sequence space, together with the costs and complexities of manual curation creates a major bottleneck in life sciences research. Existing resources lack scalable visualization tools that are instrumental for functional annotation. Here, we describe a multi-dimensional scaling (MDS) implementation to create a 3D embedding of the PSU that allows visualizing the relationships between large numbers of proteins. To demonstrate the method, we use sequence similarity scores as a measure of proximity. An example of the prokaryotic PSU shows that the low-dimensional representation preserves important grouping features such as relative proximity of functionally similar clusters and clear structural separation between clusters with specific and general functions. The advantages of the method and its implementation include the ability to scale to large numbers of sequences, integrate different similarity measures with other functional and experimental data, and facilitate protein annotation. Transdisciplinary approaches akin to the one described in this paper are urgently needed to quickly and efficiently translate the influx of new data into tangible innovations and groundbreaking discoveries.