CRUCIBLE: towards unified secure on- and off-line analytics at scale

Authors:
Peter Coetzee;Stephen Jarvis
Affiliations:
University of Warwick, Coventry, United Kingdom;University of Warwick, Coventry, United Kingdom
Venue:
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Year:
2013

Citing 13
Cited 0

MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
Google Guice: Agile Lightweight Dependency Injection Framework (Firstpress)

Google Guice: Agile Lightweight Dependency Injection Framework (Firstpress)
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Stream warehousing with DataDepot

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
DEDUCE: at the intersection of MapReduce and stream processing

Proceedings of the 13th International Conference on Extending Database Technology
An introduction to Microsoft SQL server StreamInsight

Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Dremel: interactive analysis of web-scale datasets

Proceedings of the VLDB Endowment
S4: Distributed Stream Computing Platform

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
A Performance Comparison of Web Service Object Marshalling and Unmarshalling Solutions

SERVICES '11 Proceedings of the 2011 IEEE World Congress on Services
SAMOA: a platform for mining big data streams

Proceedings of the 22nd international conference on World Wide Web companion

Quantified Score

Hi-index	0.00

Visualization

Abstract

The burgeoning field of data science benefits from the application of a variety of analytic models and techniques to the oft-cited problems of large volume, high velocity data rates, and significant variety in data structure and semantics. Many approaches make use of common analytic techniques in either a streaming or batch processing paradigm. This paper presents progress in developing a framework for the analysis of large-scale datasets using both of these pools of techniques in a unified manner. This includes: (1) a Domain Specific Language (DSL) for describing analyses as a set of Communicating Sequential Processes, fully integrated with the Java type system, including an Integrated Development Environment (IDE) and a compiler which builds idiomatic Java; (2) a runtime model for execution of an analytic in both streaming and batch environments; and (3) a novel approach to automated management of cell-level security labels, applied uniformly across all runtimes. The paper concludes with a demonstration of the successful use of this system with a sample workload developed in (1), and an analysis of the performance characteristics of each of the runtimes described in (2).