Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Using partial information to update materialized views
Information Systems
Data warehouse: practical advice from the experts
Data warehouse: practical advice from the experts
Parallel systems in the data warehouse
Parallel systems in the data warehouse
Distributed and parallel computing issues in data warehousing (abstract)
PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Materialized views: techniques, implementations, and applications
Materialized views: techniques, implementations, and applications
Making views self-maintainable for data warehousing
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Building a Better Data Warehouse
Building a Better Data Warehouse
Concurrent Systems: An Integrated Approach to Operating Systems, Database, and Distributed ...
Concurrent Systems: An Integrated Approach to Operating Systems, Database, and Distributed ...
Data Mining Techniques: For Marketing, Sales, and Customer Support
Data Mining Techniques: For Marketing, Sales, and Customer Support
Volcano An Extensible and Parallel Query Evaluation System
IEEE Transactions on Knowledge and Data Engineering
Multiple-View Self-Maintenance in Data Warehousing Environments
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Hi-index | 0.00 |
The process of preparing data for mining includes extracting the data from multiple sources, cleaning it, transforming it to a common format, and finally, writing the data to the target warehouse(s) or file(s). Because data mining applications look for patterns and correlations that were not previously predicted, these applications are particularly susceptible to spurious results if the input data are bad. Important patterns may be missed, and nonexistent correlations detected. The data must be cleaned and transformed into a structure usable by the data mining application. The process involves moving and processing very large amounts of data on a regular basis, and significant challenges exist in performing these steps efficiently. This article describes the kinds of corrections that must be made and techniques for managing and optimizing the transformation process.