Integrating and querying source code of programs working on a database
KEYS '12 Proceedings of the Third International Workshop on Keyword Search on Structured Data
Querying external source code files of programs connecting to a relational database
Proceedings of the 5th Ph.D. workshop on Information and knowledge
Study on data preprocessing for daylight climate data
ICICA'12 Proceedings of the Third international conference on Information Computing and Applications
Extending ER models to capture database transformations to build data sets for data mining
Data & Knowledge Engineering
Hi-index | 0.00 |
In general, there is a significant amount of data mining analysis performed outside a database system, which creates many data management issues. This article presents a summary of our experience and recommendations to compute data set preprocessing and transformation inside a database system (i.e. data cleaning, record selection, summarization, denormalization, variable creation, coding), which is the most time-consuming task in data mining projects. This aspect is largely ignored in the literature. We present practical issues, common solutions and lessons learned when preparing and transforming data sets with the SQL language, based on experience from real-life projects. We then provide specific guidelines to translate programs written in a traditional programming language into SQL statements. Based on successful real-life projects, we present time performance comparisons between SQL code running inside the database system and external data mining programs. We highlight which steps in data mining projects become faster when processed by the database system. More importantly, we identify advantages and disadvantages from a practical standpoint based on data mining users feedback.