Parallel database systems: the future of high performance database systems
Communications of the ACM
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Implementing data cubes efficiently
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Integrating association rule mining with relational database systems: alternatives and implications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A classification-based methodology for planning audit strategies in fraud detection
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
SQL database primitives for decision tree classifiers
Proceedings of the tenth international conference on Information and knowledge management
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery
Integrating Data Mining with SQL Databases: OLE DB for Data Mining
Proceedings of the 17th International Conference on Data Engineering
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Sense of Self for Unix Processes
SP '96 Proceedings of the 1996 IEEE Symposium on Security and Privacy
Anomaly detection of web-based attacks
Proceedings of the 10th ACM conference on Computer and communications security
A Serial Combination of Anomaly and Misuse IDSes Applied to HTTP Traffic
ACSAC '04 Proceedings of the 20th Annual Computer Security Applications Conference
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Finding Representative Set from Massive Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A Hybrid Network Intrusion Detection Technique Using Random Forests
ARES '06 Proceedings of the First International Conference on Availability, Reliability and Security
MauveDB: supporting model-based user views in database systems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
YALE: rapid prototyping for complex data mining tasks
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Building statistical models and scoring with UDFs
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Data mining approaches for intrusion detection
SSYM'98 Proceedings of the 7th conference on USENIX Security Symposium - Volume 7
An overview of anomaly detection techniques: Existing solutions and latest technological trends
Computer Networks: The International Journal of Computer and Telecommunications Networking
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficiently answering top-k typicality queries on large databases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Detecting anomalous access patterns in relational databases
The VLDB Journal — The International Journal on Very Large Data Bases
MAD skills: new analysis practices for big data
Proceedings of the VLDB Endowment
Using trees to depict a forest
Proceedings of the VLDB Endowment
A comprehensive approach to anomaly detection in relational databases
DBSec'05 Proceedings of the 19th annual IFIP WG 11.3 working conference on Data and Applications Security
A learning-based approach to the detection of SQL attacks
DIMVA'05 Proceedings of the Second international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
On Armstrong-compliant logical query languages
Proceedings of the 4th International Workshop on Logic in Databases
Hi-index | 0.00 |
Data mining is increasingly performed by people who are not computer scientists or professional programmers. It is often done as an iterative process involving multiple ad-hoc tasks, as well as data pre- and post-processing, all of which must be executed over large databases. In order to make data mining more accessible, it is critical to provide a simple, easy-to-use language that allows the user to specify ad-hoc data processing, model construction, and model manipulation. Simultaneously, it is necessary for the underlying system to scale up to large datasets. Unfortunately, while each of these requirements can be satisfied, individually, by existing systems, no system fully satisfies all criteria. In this paper, we present a system called Splash to fill this void. Splash supports an extended relational data model and SQL query language, which allows for the natural integration of statistical modeling and ad-hoc data processing. It also supports a novel representatives operator to help explain models using a limited number of examples. We have developed a prototype implementation of Splash. Our experimental study indicates that it scales well to large input datasets. Further, to demonstrate the simplicity of the language, we conducted a case study using Splash to perform a series of exploratory analyses using network log data. Our study indicates that the query-based interface is simpler than a common data mining software package, and it often requires less programming effort to use.