DCUBE: CUBE on dirty databases

Authors:
Guohua Jiang;Hongzhi Wang;Shouxu Jiang;Jianzhong Li;Hong Gao
Affiliations:
Institute of Computer Science and Technology, Harbin Institute of Technology, Harbin, China;Institute of Computer Science and Technology, Harbin Institute of Technology, Harbin, China;Institute of Computer Science and Technology, Harbin Institute of Technology, Harbin, China;Institute of Computer Science and Technology, Harbin Institute of Technology, Harbin, China;Institute of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Venue:
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Year:
2010

Citing 11
Cited 0

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
ConQuer: efficient management of inconsistent databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Enhancing Data Analysis with Noise Removal

IEEE Transactions on Knowledge and Data Engineering
Clean Answers over Dirty Databases: A Probabilistic Approach

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Adaptive cleaning for RFID data streams

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
OLAP over uncertain and imprecise data

The VLDB Journal — The International Journal on Very Large Data Bases
First-order query rewriting for inconsistent databases

Journal of Computer and System Sciences
FICSR: feedback-based inconsistency resolution and query processing on misaligned data sources

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A Sampling-Based Approach to Information Recovery

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Aggregate Query Answering under Uncertain Schema Mappings

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Resolution-Aware Query Answering for Business Intelligence

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the real world databases, dirty data such as inconsistent data, duplicate data affect the effectiveness of applications with database. It brings new challenges to efficiently process OLAP on the database with dirty data. CUBE is an important operator for OLAP. This paper proposes the CUBE operation based on overlapping clustering, and an effective and efficient storing and computing method for CUBE on the database with dirty data. Based on CUBE, this paper proposes efficient algorithms for answering aggregation queries, and the processing methods of other major operators for OLAP on the database with dirty data. Experimental results show the efficiency of the algorithms presented in this paper.