Extending metadata definitions by automatically extracting and organizing glossary definitions

  • Authors:
  • Eduard Hovy;Andrew Philpot;Judith Klavans;Ulrich Germann;Peter T. Davis

  • Affiliations:
  • University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA;Columbia University, New York, NY;University of Southern California, Marina del Rey, CA;Columbia University, New York, NY

  • Venue:
  • dg.o '03 Proceedings of the 2003 annual national conference on Digital government research
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Metadata descriptions of database contents are required to build and use systems that access and deliver data in response to user requests. When numerous heterogeneous databases are brought together in a single system, their various metadata formalizations must be homogenized and integrated in order to support the access planning and delivery system. This integration is a tedious process that requires human expertise and attention. In this paper we describe a method of speeding up the formalization and integration of new metadata. The method takes advantage of the fact that databases are often described in web pages containing natural language glossaries that define pertinent aspects of the data. Given a root URL, our method identifies likely glossaries, extracts and formalizes aspects of relevant concepts defined in them, and automatically integrates the new formalized metadata concepts into a large model of the domain and associated conceptualizations. This demo will show the end-to-end performance of this system.The demonstration will show the concept acquisition and placement process. Using the AskCal interface (Philpot et al, 2002), we will ask the viewer to interact with the system to retrieve some data. We will then introduce a new topic, one for whom there is no concept yet in the ontology. We will browse the ontology to verify this. We will identify candidate glossary web pages or sites using prior work (Klavans et al, 2002) and/or a web-accessible term finder. In a separate window, we will then display a glossary-containing file from www.eia.gov or similar, in which the concept is defined in text. Using the system described in the accompanying paper, we will enter the root URL and activate the glossary analysis and concept alignment procedures. Upon conclusion, the system will announce the acquisition and placement of as many concepts as it has found. We will then re-display the ontology in the browser. Newly acquired concepts will be displayed in a different color. The user will be free to click on the concepts in order to examine the formalized contents, as well as hyperclick back to the glossary source page for verification. This demo will thus illustrate not only our latest work, but the major part of the EDC system that we have been building over the past 4 years.