Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
ConQuer: efficient management of inconsistent databases
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Enhancing Data Analysis with Noise Removal
IEEE Transactions on Knowledge and Data Engineering
Clean Answers over Dirty Databases: A Probabilistic Approach
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Adaptive cleaning for RFID data streams
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
OLAP over uncertain and imprecise data
The VLDB Journal — The International Journal on Very Large Data Bases
First-order query rewriting for inconsistent databases
Journal of Computer and System Sciences
FICSR: feedback-based inconsistency resolution and query processing on misaligned data sources
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A Sampling-Based Approach to Information Recovery
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Aggregate Query Answering under Uncertain Schema Mappings
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Resolution-Aware Query Answering for Business Intelligence
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Hi-index | 0.00 |
In the real world databases, dirty data such as inconsistent data, duplicate data affect the effectiveness of applications with database. It brings new challenges to efficiently process OLAP on the database with dirty data. CUBE is an important operator for OLAP. This paper proposes the CUBE operation based on overlapping clustering, and an effective and efficient storing and computing method for CUBE on the database with dirty data. Based on CUBE, this paper proposes efficient algorithms for answering aggregation queries, and the processing methods of other major operators for OLAP on the database with dirty data. Experimental results show the efficiency of the algorithms presented in this paper.