Automatic semantic tagging of unknown proper names

  • Authors:
  • Alessandro Cucchiarelli;Danilo Luzi;Paola Velardi

  • Affiliations:
  • Università di Ancona, Istituto di Informatica, Ancona, Italia;Università di Ancona, Istituto di Informatica, Ancona, Italia;Università di Roma 'La Sapienza', Roma, Italia

  • Venue:
  • COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Implemented methods for proper names recognition rely on large gazetteers of common proper nouns and a set of heuristic rules (e.g. Mr. as an indicator of a PERSON entity type). Though the performance of current PN recognizers is very high (over 90%), it is important to note that this problem is by no means a "solved problem". Existing systems perform extremely well on newswire corpora by virtue of the availability of large gazetteers and rule bases designed for specific tasks (e.g. recognition of Organization and Person entity types as specified in recent Message Understanding Conferences MUC).However, large gazetteers are not available for most languages and applications other than newswire texts and, in any case, proper nouns are an open class.In this paper we describe a context-based method to assign an entity type to unknown proper names (PNs). Like many others, our system relies on a gazetteer and a set of context-dependent heuristics to classify proper nouns. However, due to the unavailability of large gazetteers in Italian, over 20% detected PNs cannot be semantically tagged.The algorithm that we propose assigns an entity type to an unknown PN based on the analysis of syntactically and semantically similar contexts already seen in the application corpus.The performance of the algorithm is evaluated not only in terms of precision, following the tradition of MUC conferences, but also in terms of Information Gain, an information theoretic measure that takes into account the complexity of the classification task.