GTM: the generative topographic mapping
Neural Computation
A Nonlinear Mapping for Data Structure Analysis
IEEE Transactions on Computers
Parallel Data Mining on Multicore Clusters
GCC '08 Proceedings of the 2008 Seventh International Conference on Grid and Cooperative Computing
Parallel Multidimensional Scaling Performance on Multicore Systems
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Dimension reduction and visualization of large high-dimensional data via interpolation
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Cloud computing paradigms for pleasingly parallel biomedical applications
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Browsing large scale cheminformatics data with dimension reduction
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Visualizing the protein sequence universe
Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
Hi-index | 0.00 |
Large high dimension datasets are of growing importance in many fields and it is important to be able to visualize them for understanding the results of data mining approaches or just for browsing them in a way that distance between points in visualization (2D or 3D) space tracks that in original high dimensional space. Dimension reduction is a well understood approach but can be very time and memory intensive for large problems. Here we report on parallel algorithms for Scaling by MAjorizing a Complicated Function (SMACOF) to solve Multidimensional Scaling problem and Generative Topographic Mapping (GTM). The former is particularly time consuming with complexity that grows as square of data set size but has advantage that it does not require explicit vectors for dataset points but just measurement of inter-point dissimilarities. We compare SMACOF and GTM on a subset of the NIH PubChem database which has binary vectors of length 166 bits. We find good parallel performance for both GTM and SMACOF and strong correlation between the dimension-reduced PubChem data from these two methods.