Supporting OLAP operations over imperfectly integrated taxonomies

Authors:
Yan Qi;K. Selçuk Candan;Junichi Tatemura;Songting Chen;Fenglin Liao
Affiliations:
Arizona State University, Tempe, AZ, USA;Arizona State University, Tempe, AZ, USA;NEC Laboratories America, Cupertino, CA, USA;NEC Laboratories America, Cupertino, CA, USA;University of California Santa Barbara, Santa Barbara, CA, USA
Venue:
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Year:
2008

Citing 21
Cited 9

An automatic technique for detecting type conflicts in database schemes

Proceedings of the seventh international conference on Information and knowledge management
Data integration: a theoretical perspective

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Introduction to Algorithms

Introduction to Algorithms
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
Supporting Imprecision in Multidimensional Databases Using Granularities

SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Learning to match ontologies on the Semantic Web

The VLDB Journal — The International Journal on Very Large Data Bases
Automatic categorization of query results

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Mediators over taxonomy-based information sources

The VLDB Journal — The International Journal on Very Large Data Bases
Multi-structural databases

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Capturing summarizability with integrity constraints in OLAP

ACM Transactions on Database Systems (TODS)
Consistent query answering in databases

ACM SIGMOD Record
Efficient allocation algorithms for OLAP over imprecise data

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A survey on ontology mapping

ACM SIGMOD Record
OLAP over uncertain and imprecise data

The VLDB Journal — The International Journal on Very Large Data Bases
FICSR: feedback-based inconsistency resolution and query processing on misaligned data sources

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Leveraging data and structure in ontology integration

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Towards keyword-driven analytical processing

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Addressing diverse user preferences in SQL-query-result navigation

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Merging models based on given correspondences

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
OLAP over imprecise data with domain constraints

VLDB '07 Proceedings of the 33rd international conference on Very large data bases

System support for exploration and expert feedback in resolving conflicts during integration of metadata

The VLDB Journal — The International Journal on Very Large Data Bases
Table summarization with the help of domain lattices

Proceedings of the 17th ACM conference on Information and knowledge management
AlphaSum: size-constrained table summarization using value lattices

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Learning to tag

Proceedings of the 18th international conference on World wide web
Reducing metadata complexity for faster table summarization

Proceedings of the 13th International Conference on Extending Database Technology
Finding an application-appropriate model for XML data warehouses

Information Systems
Multidimensional cyclic graph approach: Representing a data cube without common sub-graphs

Information Sciences: an International Journal
Graph cube: on warehousing and OLAP multidimensional networks

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
OLAPing social media: the case of Twitter

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

OLAP is an important tool in decision support. With the help of domain knowledge, such as hierarchies of attribute values, OLAP helps the user observe the effects of various decisions. One assumption of most OLAP operations is that the available domain knowledge is precise. In particular, they assume that the hierarchy of values over which the user can navigate forms a taxonomy. In this paper, we first note that when multiple heterogeneous data sources are involved in the gathering of the data and the associated domain knowledge, the integrated knowledge-base, constructed by combining locally available taxonomies based on the concept matchings, may not be a taxonomy itself. Specifically, existence of intersections among concepts from different sources compromises the tree-structure of the integrated taxonomy and prevents effective use of hierarchical navigation techniques, such as drill-down and roll-up. To cope with this, we introduce concept un-classification, where a select few of the concepts are eliminated to ensure that the remaining structure is a navigable taxonomy, without concept intersections. Since un-classifying an originally classified data is not desirable, we consider ways to minimize un-classification in the process. We introduce a cost model which captures the imprecision caused by the un-classification process and we formulate the problem of finding an un-classification strategy which eliminates intersections and which adds minimal imprecision to the resulting structure. We show that, when performed naively, this task can be very costly and thus we propose a bottom-up preprocessing strategy which supports basic navigational analytics operations, such as drill-down and roll-up. Experiments over synthetic and real-life data verified the effectiveness and efficiency of our approach.