Locating noun phrases with finite state transducers

  • Authors:
  • Jean Senellart

  • Affiliations:
  • Université Paris VII, Paris

  • Venue:
  • COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a method for constructing, maintaining and consulting a database of proper nouns. We describe noun phrases composed of a proper noun and/or a description of a human occupation. They are formalized by finite state transducers (FST) and large coverage dictionaries and are applied to a corpus of newspapers. We take into account synonymy and hyperonymy. This first stage of our parsing procedure has a high degree of accuracy. We show how we can handle requests such as: 'Find all newspaper articles in a general corpus mentioning the French prime minister', or 'How is Mr. X referred to in the corpus; what have been his different occupations through out the period over which our corpus extends?' In the first case, non trivial occurrences of noun phrases are located, that is phrases not containing words present in the request, but either synonyms, or proper nouns relevant to request. The results of the search is far better than than those obtained by a key-word based engine. Most answers are correct: except some cases of homonymy (where a human reader would also fail without more context). Also, the treatment of people having several different occupations is not fully resolved. We have built for French, a library of about one thousand such FSTs, and English FSTs are under construction. The same method can be used to locate and propose new proper nouns, simply by replacing given proper names in the same FSTs by variables.