Modeling and Querying E-Commerce Data in Hybrid Relational-XML DBMSs

  • Authors:
  • Lipyeow Lim;Haixun Wang;Min Wang

  • Affiliations:
  • IBM T. J. Watson Research Center,;IBM T. J. Watson Research Center,;IBM T. J. Watson Research Center,

  • Venue:
  • ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data in many industrial application systems are often neither completely structured nor unstructured. Consequently semi-structured data models such as XML have become popular as a lowest common denominator to manage such data. The problem is that although XML is adequate to represent the flexible portion of the data, it fails to exploit the highly structured portion of the data. XML normalization theory could be used to factor out the structured portion of the data at the schema level, however, queries written against the original schema no longer run on the normalized XML data. In this paper, we propose a new approach called eXtricate that stores XML documents in a space-efficient decomposed way while supporting efficient processing on the original queries. Our method exploits the fact that considerable amount of information is shared among similar XML documents, and by regarding each document as consisting of a shared framework and a small diff script, we can leverage the strengths of both the relational and XML data models at the same time to handle such data effectively. We prototyped our approach on top of DB2 9 pureXML (a commercial hybrid relational-XML DBMS). Our experiments validate the amount of redundancy in real e-catalog data and show the effectiveness of our method.