Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Greedy approximation algorithms for finding dense components in a graph
APPROX '00 Proceedings of the Third International Workshop on Approximation Algorithms for Combinatorial Optimization
Finding a Maximum Density Subgraph
Finding a Maximum Density Subgraph
Discovering large dense subgraphs in massive graphs
VLDB '05 Proceedings of the 31st international conference on Very large data bases
An efficient parallel approach for identifying protein families in large-scale metagenomic data sets
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Finding Dense Subgraphs with Size Bounds
WAW '09 Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph
Parallel Clustering Algorithm for Large Data Sets with Applications in Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Networks: An Introduction
Parallel algorithms for large-scale computational metagenomics
Parallel algorithms for large-scale computational metagenomics
Hi-index | 0.00 |
Graph algorithms on parallel architectures present an interesting case study for irregular applications. Among the graph algorithms popular in scientific computing, graph clus tering or community detection has numerous applications in computational biology. However, this operation also poses serious computational challenges because of irregular memory access patterns, large memory requirements, and their dependence on other auxiliary (also irregular) data structures to supplement processing. In this paper, we address the problem of graph clustering on shared memory machines. We present a new OpenMP-based parallel algorithm called pClust-sm, which uses adjacency lists, hash tables and union-find data structures in parallel. The algorithm improves both the asymptotic runtime and memory complexities of a previous serial implementation. Preliminary results show that this algorithm can scale up to 8 threads (cores) of a shared memory machine on a real world metagenomics input graph with 1.2M vertices and 100M edges. More importantly, the new implementation drastically reduces the time to solution from the order of several hours to just over 4 minutes, and in addition, it enhances the problem size reach by at least one order of magnitude.