Mining semantics for large scale integration on the web: evidences, insights, and challenges

  • Authors:
  • Kevin Chen-Chuan Chang;Bin He;Zhen Zhang

  • Affiliations:
  • University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign

  • Venue:
  • ACM SIGKDD Explorations Newsletter
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Web has been rapidly "deepened" -- with myriad searchable databases online, where data are hidden behind query interfaces. Toward large scale integration over this "deep Web," we are facing a new challenge- With its dynamic and ad-hoc nature, such large scale integration mandates dynamic semantics discovery. That is, we must on-the-fly cope with "semantics" of dynamically discovered sources without pre-configured source-specific knowledge. To tackle this challenge, our initial works hinge on the insight that the large scale is itself also a unique opportunity: We observe that the desired "semantics" often connects to surface presentation characteristics, through some hidden regularities over many sources. Such regularities can be essentially leveraged in enabling semantics discovery. In particular, we report our evidences in three initial tasks for integrating the deep Web: interface extraction, schema matching, and query translation. Generalizing these specific evidences, we thus propose our "unified insight" of "mining" semantics for large scale integration by exploiting hidden regularities across holistic sources. Further, to fulfill the promise of such holistic mining, we discuss challenges toward its realization for dynamic semantics discovery. As our initial works as well as several related efforts have witnessed, we believe our unified insight, holistic mining for semantics discovery, is a promising methodology toward enabling large scale integration.