Crunching large graphs with commodity processors

Authors:
Jacob Nelson;Brandon Myers;A. H. Hunter;Preston Briggs;Luis Ceze;Carl Ebeling;Dan Grossman;Simon Kahan;Mark Oskin
Affiliations:
University of Washington;University of Washington;University of Washington;University of Washington;University of Washington;University of Washington;University of Washington;University of Washington and Pacific Northwest National Laboratory;University of Washington
Venue:
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Year:
2011

Citing 19
Cited 4

The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
First-class user-level threads

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Dissecting Cyclops: a detailed analysis of a multithreaded architecture

ACM SIGARCH Computer Architecture News
Capriccio: scalable threads for internet services

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
ELDORADO

Proceedings of the 2nd conference on Computing frontiers
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Multiple Flows of Control in Migratable Parallel Programs

ICPPW '06 Proceedings of the 2006 International Conference Workshops on Parallel Processing
UPC: Distributed Shared Memory Programming (Wiley Series on Parallel and Distributed Computing)

UPC: Distributed Shared Memory Programming (Wiley Series on Parallel and Distributed Computing)
Parallel Programmability and the Chapel Language

International Journal of High Performance Computing Applications
A closer look at GPUs

Communications of the ACM
Performance Analysis and Evaluation of PCIe 2.0 and Quad-Data Rate InfiniBand

HOTI '08 Proceedings of the 2008 16th IEEE Symposium on High Performance Interconnects
Multi-threaded library for many-core systems

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks

ASONAM '10 Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining

Managing large graphs on multi-cores with graph awareness

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Prototyping hardware support for irregular applications

Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Scale-out NUMA

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Maximal clique enumeration for large graphs on hadoop framework

Proceedings of the first workshop on Parallel programming for analytics applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Crunching large graphs is the basis of many emerging applications, such as social network analysis and bioinformatics. Graph analytics algorithms exhibit little locality and therefore present significant performance challenges. Hardware multithreading systems (e.g., Cray XMT) show that with enough concurrency, we can tolerate long latencies. Unfortunately, this solution is not available with commodity parts. Our goal is to develop a latency-tolerant system built out of commodity parts and mostly in software. The proposed system includes a runtime that supports a large number of lightweight contexts, full-bit synchronization and a memory manager that provides a high-latency but high-bandwidth global shared memory. This paper lays out the vision for our system and justifies its feasibility with a performance analysis of the run-time for latency tolerance.