Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods

  • Authors:
  • Georgios Petasis;Alessandro Cucchiarelli;Paola Velardi;Georgios Paliouras;Vangelis Karkaletsis;Constantine D. Spyropoulos

  • Affiliations:
  • Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece;Istituto di Informatica, Università di Ancona, Via Brecce Bianche, Ancona;Dip. di Scienze dell'Informazione, Università di Roma 'La Sapienza', Via Salaria 113, Roma;Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece;Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece;Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece

  • Venue:
  • SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

The recognition of Proper Nouns (PNs) is considered an important task in the area of Information Retrieval and Extraction. However the high performance of most existing PN classifiers heavily depends upon the availability of large dictionaries of domain-specific Proper Nouns, and a certain amount of manual work for rule writing or manual tagging. Though it is not a heavy requirement to rely on some existing PN dictionary (often these resources are available on the web), its coverage of a domain corpus may be rather low, in absence of manual updating. In this paper we propose a technique for the automatic updating of an PN Dictionary through the cooperation of an inductive and a probabilistic classifier. In our experiments we show that, whenever an existing PN Dictionary allows the identification of 50% of the proper nouns within a corpus, our technique allows, without additional manual effort, the successful recognition of about 90% of the remaining 50%.