Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Integrating association rule mining with relational database systems: alternatives and implications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
NonStop SQL/MX primitives for knowledge discovery
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
SQLEM: fast clustering in SQL using the EM algorithm
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SQL database primitives for decision tree classifiers
Proceedings of the tenth international conference on Information and knowledge management
Mining database structure; or, how to build a data quality browser
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Discovery-Driven Exploration of OLAP Data Cubes
EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Integrating Data Mining with SQL Databases: OLE DB for Data Mining
Proceedings of the 17th International Conference on Data Engineering
Spreadsheets in RDBMS for OLAP
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Effective use of block-level sampling in statistics estimation
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Vertical and horizontal percentage aggregations
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
SVM in oracle database 10g: removing the barriers to widespread adoption of support vector machines
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Integrating K-Means Clustering with a Relational DBMS Using SQL
IEEE Transactions on Knowledge and Data Engineering
Incremental approximate matrix factorization for speeding up support vector machines
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
PIVOT and UNPIVOT: optimization and execution strategies in an RDBMS
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Referential integrity quality metrics
Decision Support Systems
The end of an architectural era: (it's time for a complete rewrite)
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
MapReduce and parallel DBMSs: friends or foes?
Communications of the ACM - Amir Pnueli: Ahead of His Time
Proceedings of the VLDB Endowment
PLANET: massively parallel learning of tree ensembles with MapReduce
Proceedings of the VLDB Endowment
MAD skills: new analysis practices for big data
Proceedings of the VLDB Endowment
Overview of sciDB: large scale array storage, processing and analysis
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Ricardo: integrating R and Hadoop
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
HadoopDB in action: building real world applications
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Statistical Model Computation with UDFs
IEEE Transactions on Knowledge and Data Engineering
On the Computation of Stochastic Search Variable Selection in Linear Regression with UDFs
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
ASTERIX: towards a scalable, semistructured data platform for evolving-world models
Distributed and Parallel Databases
ArrayStore: a storage manager for complex parallel array processing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Dynamic optimization of generalized SQL queries with horizontal aggregations
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
The MADlib analytics library: or MAD skills, the SQL
Proceedings of the VLDB Endowment
Fast PCA computation in a DBMS with aggregate UDFs and LAPACK
Proceedings of the 21st ACM international conference on Information and knowledge management
BigBench: towards an industry standard benchmark for big data analytics
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Shark: SQL and rich analytics at scale
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Relational DBMSs remain the main data management technology, despite the big data analytics and no-SQL waves. On the other hand, for data analytics in a broad sense, there are plenty of non-DBMS tools including statistical languages, matrix packages, generic data mining programs and large-scale parallel systems, being the main technology for big data analytics. Such large-scale systems are mostly based on the Hadoop distributed file system and MapReduce. Thus it would seem a DBMS is not a good technology to analyze big data, going beyond SQL queries, acting just as a reliable and fast data repository. In this survey, we argue that is not the case, explaining important research that has enabled analytics on large databases inside a DBMS. However, we also argue DBMSs cannot compete with parallel systems like MapReduce to analyze web-scale text data. Therefore, each technology will keep influencing each other. We conclude with a proposal of long-term research issues, considering the "big data analytics" trend.