A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Introduction to Parallel Processing: Algorithms and Architectures
Introduction to Parallel Processing: Algorithms and Architectures
Directed diffusion for wireless sensor networking
IEEE/ACM Transactions on Networking (TON)
ACM Transactions on Computer Systems (TOCS)
ICNP '97 Proceedings of the 1997 International Conference on Network Protocols (ICNP '97)
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A scalable distributed information management system
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
TAG: a Tiny AGgregation service for Ad-Hoc sensor networks
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
NetFPGA: reusable router architecture for experimental research
Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow
A scalable, commodity data center network architecture
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
VL2: a scalable and flexible data center network
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
BCube: a high performance, server-centric network architecture for modular data centers
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Safe and effective fine-grained TCP retransmissions for datacenter communication
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
RouteBricks: exploiting parallelism to scale software routers
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Distributed aggregation for data-parallel computing: interfaces and implementations
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
Blue Gene/L torus interconnection network
IBM Journal of Research and Development
Data warehousing and analytics infrastructure at facebook
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Symbiotic routing in future data centers
Proceedings of the ACM SIGCOMM 2010 conference
Proceedings of the ACM SIGCOMM 2010 conference
Hedera: dynamic flow scheduling for data center networks
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Network traffic characteristics of data centers in the wild
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Scafida: a scale-free network inspired data center architecture
ACM SIGCOMM Computer Communication Review
ICTCP: Incast Congestion Control for TCP in data center networks
Proceedings of the 6th International COnference
ServerSwitch: a programmable and high performance platform for data center networks
Proceedings of the 8th USENIX conference on Networked systems design and implementation
CIEL: a universal execution engine for distributed data-flow computing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Apache hadoop goes realtime at Facebook
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Proceedings of the 2nd ACM Symposium on Cloud Computing
Incoop: MapReduce for incremental computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
The Case for Evaluating MapReduce Performance Using Workload Suites
MASCOTS '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
Willow: DHT, aggregation, and publish/subscribe in one protocol
IPTPS'04 Proceedings of the Third international conference on Peer-to-Peer Systems
Jellyfish: networking data centers randomly
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Nobody ever got fired for using Hadoop on a cluster
Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing
Programming your network at run-time for big data applications
Proceedings of the first workshop on Hot topics in software defined networks
Bridging the gap between applications and networks in data centers
ACM SIGOPS Operating Systems Review
CamCubeOS: a key-based network stack for 3D torus cluster topologies
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Exploiting in-network processing for big data management
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Supporting application-specific in-network processing in data centres
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Hi-index | 0.00 |
Large companies like Facebook, Google, and Microsoft as well as a number of small and medium enterprises daily process massive amounts of data in batch jobs and in real time applications. This generates high network traffic, which is hard to support using traditional, oversubscribed, network infrastructures. To address this issue, several novel network topologies have been proposed, aiming at increasing the bandwidth available in enterprise clusters. We observe that in many of the commonly used workloads, data is aggregated during the process and the output size is a fraction of the input size. This motivated us to explore a different point in the design space. Instead of increasing the bandwidth, we focus on decreasing the traffic by pushing aggregation from the edge into the network. We built Camdoop, a MapReduce-like system running on CamCube, a cluster design that uses a direct-connect network topology with servers directly linked to other servers. Camdoop exploits the property that CamCube servers forward traffic to perform in-network aggregation of data during the shuffle phase. Camdoop supports the same functions used in MapReduce and is compatible with existing MapReduce applications. We demonstrate that, in common cases, Camdoop significantly reduces the network traffic and provides high performance increase over a version of Camdoop running over a switch and against two production systems, Hadoop and Dryad/DryadLINQ.