Improving XML schema matching performance using Prüfer sequences

  • Authors:
  • Alsayed Algergawy;Eike Schallehn;Gunter Saake

  • Affiliations:
  • Department of Computer Science, Otto-von-Guericke University, 39016 Magdeburg, Germany;Department of Computer Science, Otto-von-Guericke University, 39016 Magdeburg, Germany;Department of Computer Science, Otto-von-Guericke University, 39016 Magdeburg, Germany

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Schema matching is a critical step for discovering semantic correspondences among elements in many data-shared applications. Most of existing schema matching algorithms produce scores between schema elements resulting in discovering only simple matches. Such results partially solve the problem. Identifying and discovering complex matches is considered one of the biggest obstacle towards completely solving the schema matching problem. Another obstacle is the scalability of matching algorithms on large number and large-scale schemas. To tackle these challenges, in this paper, we propose a new XML schema matching framework based on the use of Prufer encoding. In particular, we develop and implement the XPruM system, which consists mainly of two parts-schema preparation and schema matching. First, we parse XML schemas and represent them internally as schema trees. Prufer sequences are constructed for each schema tree and employed to construct a sequence representation of schemas. We capture schema tree semantic information in Label Prufer Sequences (LPS) and schema tree structural information in Number Prufer Sequences (NPS). Then, we develop a new structural matching algorithm exploiting both LPS and NPS. To cope with complex matching discovery, we introduce the concept of compatible nodes to identify semantic correspondences across complex elements first, then the matching process is refined to identify correspondences among simple elements inside each pair of compatible nodes. Our experimental results demonstrate the performance benefits of the XPruM system.