Building a terminological database from heterogeneous definitional sources

  • Authors:
  • Smaranda Muresan;Samuel D. Popper;Peter T. Davis;Judith L. Klavans

  • Affiliations:
  • Columbia University, New York, NY;Columbia University, New York, NY;Columbia University, New York, NY;Columbia University, New York, NY

  • Venue:
  • dg.o '03 Proceedings of the 2003 annual national conference on Digital government research
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

An obstacle to understanding results across heterogeneous databases is the ability to determine conceptual connections between differing terminologies. In this paper, we present the two step approach which we have used to build a terminological database in order to address this issue. First we automatically built a heterogeneous collection of terms and definitions from two types of dynamic sources: 1) glossaries automatically identified from 147 government web sites and 2) definitions extracted from 600 unstructured articles. After storing terms and their definitions, we semantically analyzed the definitions to store the terminological knowledge in a relational database. Currently the database contains 12,780 definitions of 8,431 terms.