Gigascope: a stream database for network applications
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Data mining with the SAP NetWeaver BI accelerator
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Bigtable: A Distributed Storage System for Structured Data
ACM Transactions on Computer Systems (TOCS)
Better tree - better fruits: using dominating set trees for MAX queries
Proceedings of the 5th workshop on Data management for sensor networks
A scalable, commodity data center network architecture
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
H-store: a high-performance, distributed main memory transaction processing system
Proceedings of the VLDB Endowment
VL2: a scalable and flexible data center network
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
BCube: a high performance, server-centric network architecture for modular data centers
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
SmartRE: an architecture for coordinated network-wide redundancy elimination
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
The case for RAMClouds: scalable high-performance storage entirely in DRAM
ACM SIGOPS Operating Systems Review
Cassandra: a decentralized structured storage system
ACM SIGOPS Operating Systems Review
Extreme scale with full SQL language support in microsoft SQL Azure
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Improving the scalability of data center networks with traffic-aware virtual machine placement
INFOCOM'10 Proceedings of the 29th conference on Information communications
SideCar: building programmable datacenter networks without programmable switches
Hotnets-IX Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks
Dremel: interactive analysis of web-scale datasets
Proceedings of the VLDB Endowment
CloudNaaS: a cloud networking platform for enterprise applications
Proceedings of the 2nd ACM Symposium on Cloud Computing
Fast crash recovery in RAMCloud
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
SAP HANA database: data management for modern business applications
ACM SIGMOD Record
Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
NaaS: network-as-a-service in the cloud
Hot-ICE'12 Proceedings of the 2nd USENIX conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services
Camdoop: exploiting in-network aggregation for big data applications
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
RAMCube: exploiting network proximity for ram-based key-value store
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Hi-index | 0.00 |
Data processing systems face the task of efficiently storing and processing data at petabyte scale, with the amount set to increase in the future. To meet such a requirement, highly scalable, shared-nothing systems, e.g. Google's BigTable [6] or Facebook's Cassandra [14], are built to partition data and process it in parallel on distributed nodes in a cluster. This allows the handling of data at scale but introduces new challenges due to the distribution of data. Running queries involves a high network overhead because data has to be exchanged between cluster nodes and hence, the network becomes a critical part of the system. To avoid the network bottleneck, it is essential for distributed data processing systems (DDPS) to be aware of the network rather than treating it as a black box. We propose in-network processing as a way of achieving network-awareness to decrease bandwidth usage by custom routing, redundancy elimination, and on-path data reduction. Thereby, we can increase the query throughput of a DDPS. The challenges of an in-network processing system range from design issues, such as performance and transparency, to the integration with query optimisation and deployment in data centres. We formulate these challenges as possible research directions and provide a prototype implementation. Our preliminary results suggest that we can significantly improve query throughput in a DDPS by performing partial data reduction within the network.