Data mining: concepts and techniques
Data mining: concepts and techniques
Learning missing values from summary constraints
ACM SIGKDD Explorations Newsletter
Supporting Fine-grained Data Lineage in a Database Visualization Environment
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Why and Where: A Characterization of Data Provenance
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A New Reparation Method for Incomplete Data in the Context of Supervised Learning
ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Research and Implementation of QAR Data Warehouse
IITA '08 Proceedings of the 2008 Second International Symposium on Intelligent Information Technology Application - Volume 03
Aprimorando processos de imputação multivariada de dados com workflows
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
A Novel Framework for Imputation of Missing Values in Databases
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Effective data warehouse for information delivery: a literature survey and classification
International Journal of Networking and Virtual Organisations
Hi-index | 0.00 |
Data quality in a typical Data Warehouse (DW) environment is critical. The process of transferring data from different sources into the DW environment, known as ETL (Extraction, Transformation, and Load), usually takes care of improving the data quality. However, it is not unusual to identify null values in a DW fact table during the ETL process, and this may impact negatively on the accuracy of data analyses results. Data imputation1 techniques are commonly used for dealing with the missing value problem. Some of them observe table values to generate a new value for the missing one. This paper proposes a new strategy to address the missing data problem on the ETL process. The idea is to enrich the DW fact table with dimension attributes, in order to reach better imputation results. The strategy uses the k-NN algorithm as the imputation approach. Tests performed on an implemented prototype showed promising results with respect to imputation quality.