Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Pairwise Data Clustering by Deterministic Annealing
IEEE Transactions on Pattern Analysis and Machine Intelligence
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
High-performance implementation of the level-3 BLAS
ACM Transactions on Mathematical Software (TOMS)
Performance of Multicore Systems on Parallel Data Clustering with Deterministic Annealing
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Multivariate Statistics: Exercises and Solutions
Multivariate Statistics: Exercises and Solutions
Cloud technologies for bioinformatics applications
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
DryadLINQ for Scientific Analyses
E-SCIENCE '09 Proceedings of the 2009 Fifth IEEE International Conference on e-Science
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Cloud technologies for bioinformatics applications
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Performance of Windows Multicore Systems on Threading and MPI
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Visualizing the protein sequence universe
Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
Hi-index | 0.02 |
Many areas of science are seeing a data deluge coming from new instruments, myriads of sensors and exponential growth in electronic records. We take two examples --- one the analysis of gene sequence data (35339 Alu sequences) and other a study of medical information (over 100,000 patient records) in Indianapolis and their relationship to Geographic and Information System and Census data available for 635 Census Blocks in Indianapolis. We look at initial processing (such as Smith Waterman dissimilarities), clustering (using robust deterministic annealing) and Multi Dimensional Scaling to map high dimension data to 3D for convenient visualization. We show how scaling pipelines can be produced that can be implemented using either cloud technologies or MPI which are compared. This study illustrates challenges in integrating data exploration tools with a variety of different architectural requirements and natural programming models. We present preliminary results for end to end study of two complete applications.