SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Research problems in data warehousing
CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
“One size fits all” database architectures do not work for DSS
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Integrating XML Data in the TARGITOLAP System
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Extending XQuery for analytics
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
C-store: a column-oriented DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
GORDIAN: efficient and scalable discovery of composite keys
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
IBM UFO repository: object-oriented data integration
Proceedings of the VLDB Endowment
Interesting-phrase mining for ad-hoc text analytics
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Gaining business insights such as measuring the effectiveness of a product campaign requires the integration of a multitude of different data sources. Such data sources include in-house applications (like CRM, ERP), partner databases (like loyalty card data from retailers), and syndicated data sources (like credit reports from Experian). However, different data sources represent the same semantic attributes in different ways. E.g., two XML schemas for purchase orders may represent price as /SAP46Order/Product/Price or /PeopleSoft/Item/Sold/ Cost, respectively. The different paths to the same semantic information depend on the schema, making it difficult to index the data and for query languages such as XQuery to process aggregation queries. Shredding the XML documents is not feasible due to the vast number of different schemas and the complexity of the XML documents. The only known approach today is to ETL every single document into a common schema, and then use XQuery on the transformed data to perform aggregation. Such a solution does not scale well with the number of schemas or their natural evoluation. This paper presents a robust solution to document-centric OLAP over highly-heterogeneous data. The solution is based on the exploitation of text-indexing that provides the necessary flexibility and well-established techniques for aggregation (like star-joins and bitmap processing). We present the overall architecture and the experimental performance results from our implementation.