A self-learning universal concept spotter

  • Authors:
  • Tomek Strzalkowski;Jin Wang

  • Affiliations:
  • GE Corporate Research and Development, Schenectady, NY;GE Corporate Research and Development, Schenectady, NY

  • Venue:
  • COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
  • Year:
  • 1996

Quantified Score

Hi-index 0.01

Visualization

Abstract

We describe the Universal Spotter, a system for identifying in-text references to entities of an arbitrary, user-specified type, such as people, organizations, equipment, products, materials, etc. Starting with some initial seed examples, and a training text corpus, the system generates rules that will find further concepts of the same type. The initial seed information is provided by the user in the form of a typical lexical context in which the entities to be spotted occur, e.g., "the name ends with Co.", or "to the right of produced or made", and so forth, or by simply supplying examples of the concept itself, e.g., Ford Taurus, gas turbine, Big Mac. In addition, negative examples can be supplied, if known. Given a sufficiently large training corpus, an unsupervised learning process is initiated in which the system will: (1) find instances of the sought-after concept using the seed-context information while maximizing recall and precision; (2) find additional contexts in which these entities occur; and (3) expand the initial seed-context with selected new contexts to find even more entities. Preliminary results of creating spotters for organizations and products are discussed.