Encapsulation of parallelism in the Volcano query processing system
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Loading databases using dataflow parallelism
ACM SIGMOD Record
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
Inclusion of New Types in Relational Data Base Systems
Proceedings of the Second International Conference on Data Engineering
Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
ZOO: A Desktop Experiment Management Environment
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Building the Data Warehouse
C-store: a column-oriented DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Web Analytics: An Hour a Day
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Splash: ad-hoc querying of data and statistical models
Proceedings of the 13th International Conference on Extending Database Technology
Fast UDFs to compute sufficient statistics on large data sets exploiting caching and sampling
Data & Knowledge Engineering
Beyond online aggregation: parallel and incremental data mining with online Map-Reduce
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
ERACER: a database approach for statistical inference and data cleaning
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Ricardo: integrating R and Hadoop
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Scalable clustering algorithm for N-body simulations in a shared-nothing cluster
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
MRShare: sharing across multiple queries in MapReduce
Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
MCDB-R: risk analysis in the database
Proceedings of the VLDB Endowment
Big data and cloud computing: current state and future opportunities
Proceedings of the 14th International Conference on Extending Database Technology
Hybrid merge/overlap execution technique for parallel array processing
Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases
ArrayStore: a storage manager for complex parallel array processing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient processing of data warehousing queries in a split execution environment
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The monte carlo database system: Stochastic analysis close to the data
ACM Transactions on Database Systems (TODS)
Massively parallel in-database predictions using PMML
Proceedings of the 2011 workshop on Predictive markup language modeling
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Analytics over large-scale multidimensional data: the big data revolution!
Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
A call to arms: revisiting database design
ACM SIGMOD Record
Building wavelet histograms on large data in MapReduce
Proceedings of the VLDB Endowment
GLADE: a scalable framework for efficient analytics
ACM SIGOPS Operating Systems Review
Approximate computation and implicit regularization for very large-scale data analysis
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
NoDB: efficient query execution on raw data files
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Towards a unified architecture for in-RDBMS analytics
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
GLADE: big data analytics made easy
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Large-scale machine learning at twitter
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Sample-based forecasting exploiting hierarchical time series
Proceedings of the 16th International Database Engineering & Applications Sysmposium
Scaling pair-wise similarity-based algorithms in tagging spaces
ICWE'12 Proceedings of the 12th international conference on Web Engineering
The MADlib analytics library: or MAD skills, the SQL
Proceedings of the VLDB Endowment
ER'12 Proceedings of the 2012 international conference on Advances in Conceptual Modeling
Predictive analytics with surveillance big data
Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
Cumulon: optimizing statistical data analysis in the cloud
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Shark: SQL and rich analytics at scale
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Scaling big data mining infrastructure: the twitter experience
ACM SIGKDD Explorations Newsletter
Towards a workload for evolutionary analytics
Proceedings of the Second Workshop on Data Analytics in the Cloud
GPText: Greenplum parallel statistical text analysis framework
Proceedings of the Second Workshop on Data Analytics in the Cloud
Knowledge discovery from massive healthcare claims data
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
pEDM: online-forecasting for smart energy analytics
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Clustering cubes with binary dimensions in one pass
Proceedings of the sixteenth international workshop on Data warehousing and OLAP
Can we analyze big data inside a DBMS?
Proceedings of the sixteenth international workshop on Data warehousing and OLAP
Data warehousing and OLAP over big data: current challenges and future research directions
Proceedings of the sixteenth international workshop on Data warehousing and OLAP
Proceedings of the 17th International Database Engineering & Applications Symposium
PREDIcT: towards predicting the runtime of large scale iterative analytics
Proceedings of the VLDB Endowment
On the distribution of the second-largest latent root for certain high dimensional Wishart matrices
International Journal of Knowledge Engineering and Soft Data Paradigms
Creating a model of the dynamics of socio-technical groups
User Modeling and User-Adapted Interaction
Hi-index | 0.00 |
As massive data acquisition and storage becomes increasingly affordable, a wide variety of enterprises are employing statisticians to engage in sophisticated data analysis. In this paper we highlight the emerging practice of Magnetic, Agile, Deep (MAD) data analysis as a radical departure from traditional Enterprise Data Warehouses and Business Intelligence. We present our design philosophy, techniques and experience providing MAD analytics for one of the world's largest advertising networks at Fox Audience Network, using the Greenplum parallel database system. We describe database design methodologies that support the agile working style of analysts in these settings. We present dataparallel algorithms for sophisticated statistical techniques, with a focus on density methods. Finally, we reflect on database system features that enable agile design and flexible algorithm development using both SQL and MapReduce interfaces over a variety of storage mechanisms.