Implementing imprecision in information systems
Information Sciences: an International Journal - Special issue on expert systems
Relational database writings (1985–1989)
Relational database writings (1985–1989)
Efficient query processing in geographic information systems
Efficient query processing in geographic information systems
On resolving schematic heterogeneity in multidatabase systems
Distributed and Parallel Databases
Building the data warehouse (2nd ed.)
Building the data warehouse (2nd ed.)
A probabilistic relational model and algebra
ACM Transactions on Database Systems (TODS)
Data mining solutions: methods and tools for solving real-world problems
Data mining solutions: methods and tools for solving real-world problems
Data warehouse performance
Enhancing data quality in data warehouse environments
Communications of the ACM
Improving data warehouse and business information quality: methods for reducing costs and increasing profits
An introduction to database systems (7th ed.)
An introduction to database systems (7th ed.)
Transactions and consistency in distributed database systems
ACM Transactions on Database Systems (TODS)
Extending the database relational model to capture more meaning
ACM Transactions on Database Systems (TODS)
Information-theoretic fuzzy approach to data reliability and data mining
Fuzzy Sets and Systems
Fuzzy division in fuzzy relational databases: an approach
Fuzzy Sets and Systems
The TSQL2 Temporal Query Language
The TSQL2 Temporal Query Language
Object Relational DBMSs: The Next Great Wave
Object Relational DBMSs: The Next Great Wave
Data Warehousing, Data Mining, and Olap
Data Warehousing, Data Mining, and Olap
Database Systems Concepts
The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom
Data Mining Techniques: For Marketing, Sales, and Customer Support
Data Mining Techniques: For Marketing, Sales, and Customer Support
Transaction Processing: Concepts and Techniques
Transaction Processing: Concepts and Techniques
A Framework for Analysis of Data Quality Research
IEEE Transactions on Knowledge and Data Engineering
An analysis of additivity in OLAP systems
Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
Duplicate detection in adverse drug reaction surveillance
Data Mining and Knowledge Discovery
A data quality metamodel extension to CWM
APCCM '07 Proceedings of the fourth Asia-Pacific conference on Comceptual modelling - Volume 67
Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
ACM Computing Surveys (CSUR)
Grid-aware approach to data statistics, data understanding and data preprocessing
International Journal of High Performance Computing and Networking
Bariatric surgery performance: A predictive informatics case study
Intelligent Data Analysis
Automatic accuracy assessment via hashing in multiple-source environment
Expert Systems with Applications: An International Journal
SemGen: towards a semantic data generator for benchmarking duplicate detectors
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Data mapper: an operator for expressing one-to-many data transformations
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Data cleansing for service-oriented architecture
EC-Web'05 Proceedings of the 6th international conference on E-Commerce and Web Technologies
Attribute uncertainty in GIS data
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Profiler: integrated statistical analysis and visualization for data quality assessment
Proceedings of the International Working Conference on Advanced Visual Interfaces
Information Visualization - Special issue on State of the Field and New Research Directions
Information quality measurement of medical encoding support based on usability
Computer Methods and Programs in Biomedicine
Hi-index | 0.00 |
Today large corporations are constructing enterprise data warehouses from disparate data sources in order to run enterprise-wide data analysis applications, including decision support systems, multidimensional online analytical applications, data mining, and customer relationship management systems. A major problem that is only beginning to be recognized is that the data in data sources are often “dirty”. Broadly, dirty data include missing data, wrong data, and non-standard representations of the same data. The results of analyzing a database/data warehouse of dirty data can be damaging and at best be unreliable. In this paper, a comprehensive classification of dirty data is developed for use as a framework for understanding how dirty data arise, manifest themselves, and may be cleansed to ensure proper construction of data warehouses and accurate data analysis. The impact of dirty data on data mining is also explored.