Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule
SIAM Journal on Optimization
A New Class of Incremental Gradient Methods for Least Squares Problems
SIAM Journal on Optimization
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Approximation algorithms for MAX-3-CUT and other problems via complex semidefinite programming
Journal of Computer and System Sciences - STOC 2001
Convex Optimization
SVM in oracle database 10g: removing the barriers to widespread adoption of support vector machines
VLDB '05 Proceedings of the 31st international conference on Very large data bases
MauveDB: supporting model-based user views in database systems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Creating probabilistic databases from information extraction models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Optimal algorithms and inapproximability results for every CSP?
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
MCDB: a monte carlo approach to managing uncertain data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
BayesStore: managing large, uncertain data repositories with probabilistic graphical models
Proceedings of the VLDB Endowment
Exploiting shared correlations in probabilistic databases
Proceedings of the VLDB Endowment
Two “well-known” properties of subgradient optimization
Mathematical Programming: Series A and B - Series B - Special Issue: Nonsmooth Optimization and Applications
Sparse Online Learning via Truncated Gradient
The Journal of Machine Learning Research
Robust Stochastic Approximation Approach to Stochastic Programming
SIAM Journal on Optimization
P-packSVM: Parallel Primal grAdient desCent Kernel SVM
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
MAD skills: new analysis practices for big data
Proceedings of the VLDB Endowment
Scalable probabilistic databases with factor graphs and MCMC
Proceedings of the VLDB Endowment
Querying probabilistic information extraction
Proceedings of the VLDB Endowment
Large-scale matrix factorization with distributed stochastic gradient descent
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Towards a unified architecture for in-RDBMS analytics
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Towards a unified architecture for in-RDBMS analytics
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
The MADlib analytics library: or MAD skills, the SQL
Proceedings of the VLDB Endowment
Hazy: making it easier to build and maintain big-data analytics
Communications of the ACM
Hazy: Making it Easier to Build and Maintain Big-data Analytics
Queue - Web Development
A performance comparison of parallel DBMSs and MapReduce on large-scale text analytics
Proceedings of the 16th International Conference on Extending Database Technology
Sparkler: supporting large-scale matrix factorization
Proceedings of the 16th International Conference on Extending Database Technology
Shark: SQL and rich analytics at scale
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Adaptive exploration for large-scale protein analysis in the molecular dynamics database
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE
Proceedings of the Second Workshop on Data Analytics in the Cloud
GPText: Greenplum parallel statistical text analysis framework
Proceedings of the Second Workshop on Data Analytics in the Cloud
Audience segment expansion using distributed in-database k-means clustering
Proceedings of the Seventh International Workshop on Data Mining for Online Advertising
Hi-index | 0.02 |
The increasing use of statistical data analysis in enterprise applications has created an arms race among database vendors to offer ever more sophisticated in-database analytics. One challenge in this race is that each new statistical technique must be implemented from scratch in the RDBMS, which leads to a lengthy and complex development process. We argue that the root cause for this overhead is the lack of a unified architecture for in-database analytics. Our main contribution in this work is to take a step towards such a unified architecture. A key benefit of our unified architecture is that performance optimizations for analytics techniques can be studied generically instead of an ad hoc, per-technique fashion. In particular, our technical contributions are theoretical and empirical studies of two key factors that we found impact performance: the order data is stored, and parallelization of computations on a single-node multicore RDBMS. We demonstrate the feasibility of our architecture by integrating several popular analytics techniques into two commercial and one open-source RDBMS. Our architecture requires changes to only a few dozen lines of code to integrate a new statistical technique. We then compare our approach with the native analytics tools offered by the commercial RDBMSes on various analytics tasks, and validate that our approach achieves competitive or higher performance, while still achieving the same quality.