Pay-as-you-go: an adaptive approach to provide full context-aware text search over document content

  • Authors:
  • Zhen Hua Liu;Thomas Baby;Sukhendu Chakraborty;Junyan Ding;Anguel Novoselsky;Vikas Arora

  • Affiliations:
  • Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA

  • Venue:
  • Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

RDBMS provides best performance for querying structured data that starts out with a well-defined schema. However, such a 'schema first, data later' approach does not work for unstructured data or data without much structure. Therefore, RDBMS typically stores such data without any schema in LOB columns (for example, Character Large Object (CLOB) or Binary Large Object (BLOB) columns) and provides Information-Retrieval (IR) style, keyword-based search capability over these LOB columns. Lately, XML as a native datatype (XMLType) in RDBMS has been introduced via the SQL/XML standard. Semi-structured data with or without any schema can be stored into such XMLType columns, and XQuery provides query capability over them. In particular, XQuery full text specification provides the capability of searching keywords within document context. Such full context-aware text search capability is more powerful than pure keyword search, since the user can now provide fine-grained context in which the keywords should occur. However, XML with XQuery full text searching requires that the user first convert her text data into XML and store them into XMLType column. Such massive physical data migration with possible loss of document fidelity and its potential impact on existing production environments are often expensive enough that users are reluctant to adopt the XML/XQuery approach. In this paper, we propose a pay-as-you-go architecture to provide XML text view over LOB columns, so that user can take advantage of context-aware full-text search capability adaptively. This adaptive architecture includes a novel XML text index that can be created over the LOB column where the content is stored. The XML text index supports an XML text view over LOB data on top of which XQuery full-text search capability is feasible. Such an adaptive index/view approach provides least intrusion over existing data, as it requires no physical data migration. We describe the design and challenge of building such an adaptive XML text index. Furthermore, we advocate that the pay-as-you-go approach provides the integration bridge between the structured relational world and text oriented document world and fulfills the primary motivation of XML in the database.