An Algebraic Language for Semantic Data Integration on the Hidden Web

  • Authors:
  • Shazzad Hosain;Hasan Jamil

  • Affiliations:
  • -;-

  • Venue:
  • ICSC '09 Proceedings of the 2009 IEEE International Conference on Semantic Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semantic integration in the hidden Web is an emerging area of research where traditional assumptions do not always hold. Frequent changes, conflicts and the sheer size of the hidden Web demand vastly different integration techniques that rely on autonomous detection and heterogeneity resolution, correspondence establishment, and information extraction strategies. In this paper, we present an algebraic language, called Integra, as a foundation for another SQL-like query language called BioFlow, for the integration of Life Sciences data on the hidden Web. The algebra presented here adopts the view that the web forms can be treated as user defined functions and the response they generate from the back end databases can be considered as traditional relations or tables. These assumptions allow us to extend the traditional relational algebra to include integration primitives such as schema matching, wrappers, form submission, and object identification as a family of database functions. These functions are then incorporated into the traditional relational algebra operators to extend them in the direction of semantic data integration. To support the well known concepts of horizontal and vertical integration, we also propose two new operators called link and combine. We show that these family of functions can be designed from existing literature and their implementation is completely orthogonal to our language in the same way many database technologies are (such as relational join operation). Finally, we show that for traditional relations without integration, our algebra reduces to classical relational algebra establishing it as a special case of Integra.