The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ICS '90 Proceedings of the 4th international conference on Supercomputing
First-class user-level threads
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Dissecting Cyclops: a detailed analysis of a multithreaded architecture
ACM SIGARCH Computer Architecture News
Capriccio: scalable threads for internet services
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Proceedings of the 2nd conference on Computing frontiers
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Multiple Flows of Control in Migratable Parallel Programs
ICPPW '06 Proceedings of the 2006 International Conference Workshops on Parallel Processing
UPC: Distributed Shared Memory Programming (Wiley Series on Parallel and Distributed Computing)
UPC: Distributed Shared Memory Programming (Wiley Series on Parallel and Distributed Computing)
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
Communications of the ACM
Performance Analysis and Evaluation of PCIe 2.0 and Quad-Data Rate InfiniBand
HOTI '08 Proceedings of the 2008 16th IEEE Symposium on High Performance Interconnects
Multi-threaded library for many-core systems
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
ASONAM '10 Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining
Managing large graphs on multi-cores with graph awareness
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Prototyping hardware support for irregular applications
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Maximal clique enumeration for large graphs on hadoop framework
Proceedings of the first workshop on Parallel programming for analytics applications
Hi-index | 0.00 |
Crunching large graphs is the basis of many emerging applications, such as social network analysis and bioinformatics. Graph analytics algorithms exhibit little locality and therefore present significant performance challenges. Hardware multithreading systems (e.g., Cray XMT) show that with enough concurrency, we can tolerate long latencies. Unfortunately, this solution is not available with commodity parts. Our goal is to develop a latency-tolerant system built out of commodity parts and mostly in software. The proposed system includes a runtime that supports a large number of lightweight contexts, full-bit synchronization and a memory manager that provides a high-latency but high-bandwidth global shared memory. This paper lays out the vision for our system and justifies its feasibility with a performance analysis of the run-time for latency tolerance.