Tailoring the automated construction of large-scale taxonomies using the web

  • Authors:
  • Zornitsa Kozareva;Eduard Hovy

  • Affiliations:
  • USC Information Sciences Institute, Marina del Rey, USA 90292-6695;USC Information Sciences Institute, Marina del Rey, USA 90292-6695

  • Venue:
  • Language Resources and Evaluation
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

It has long been a dream to have available a single, centralized, semantic thesaurus or terminology taxonomy to support research in a variety of fields. Much human and computational effort has gone into constructing such resources, including the original WordNet and subsequent wordnets in various languages. To produce such resources one has to overcome well-known problems in achieving both wide coverage and internal consistency within a single wordnet and across many wordnets. In particular, one has to ensure that alternative valid taxonomizations covering the same basic terms are recognized and treated appropriately. In this paper we describe a pipeline of new, powerful, minimally supervised, automated algorithms that can be used to construct terminology taxonomies and wordnets, in various languages, by harvesting large amounts of online domain-specific or general text. We illustrate the effectiveness of the algorithms both to build localized, domain-specific wordnets and to highlight and investigate certain deeper ontological problems such as parallel generalization hierarchies. We show shortcomings and gaps in the manually-constructed English WordNet in various domains.