From regular expressions to nested words: unifying languages and query execution for relational and XML sequences

  • Authors:
  • Barzan Mozafari;Kai Zeng;Carlo Zaniolo

  • Affiliations:
  • University of California at Los Angeles, California;University of California at Los Angeles, California;University of California at Los Angeles, California

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

There is growing interest in query language extensions for pattern matching over event streams and stored database sequences, due to the many important applications that such extensions make possible. The push for such extensions has led DBMS vendors and DSMS venture companies to propose Kleene-closure extensions of SQL standards, building on seminal research that demonstrated the effectiveness and amenability to efficient implementation of such constructs. These extensions, however powerful, suffer from limitations that severely impair their effectiveness in many real-world applications. To overcome these problems, we have designed the K*SQL language and system, based on our investigation of the nested words, which are recent models that generalize both words and trees. K*SQL extends the existing relational sequence languages, and also enables applications from other domains such as genomics, software analysis, and XML processing. At the same time, K*SQL remains extremely efficient, using our powerful optimizations for pattern search over nested words. Furthermore, we show that other sequence languages and XPath can be automatically translated into K*SQL, allowing for K*SQL to be also used as a high-performance query execution back-end for those languages. Therefore, K*SQL is a unifying SQL-based engine for sequence and XML queries, which provides novel optimization techniques for both.