Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods

Authors:
Georgios Petasis;Alessandro Cucchiarelli;Paola Velardi;Georgios Paliouras;Vangelis Karkaletsis;Constantine D. Spyropoulos
Affiliations:
Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece;Istituto di Informatica, Università di Ancona, Via Brecce Bianche, Ancona;Dip. di Scienze dell'Informazione, Università di Roma 'La Sapienza', Via Salaria 113, Roma;Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece;Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece;Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece
Venue:
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2000

Citing 11
Cited 9

C4.5: programs for machine learning

C4.5: programs for machine learning
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
WordNet: a lexical database for English

Communications of the ACM
Finding a domain-appropriate sense inventory for semantically tagging a corpus

Natural Language Engineering
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Generalizing automatically generated selectional patterns

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
A "not-so-shallow" parser for collocational analysis

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Word-sense disambiguation using statistical models of Roget's categories trained on large corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Finite-state phrase parsing by rule sequences

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
CRL/NMSU: description of the CRL/NMSU systems used for MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding
One sense per discourse

HLT '91 Proceedings of the workshop on Speech and Natural Language

Using text processing techniques to automatically enrich a domain ontology

Proceedings of the international conference on Formal Ontology in Information Systems - Volume 2001
Automatic construction of English/Chinese parallel corpora

Journal of the American Society for Information Science and Technology
Unsupervised named entity classification models and their ensembles

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Using machine learning to maintain rule-based named-entity recognition and classification systems

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Challenges and resources for evaluating geographical IR

Proceedings of the 2005 workshop on Geographic information retrieval
Automatic feature thesaurus enrichment: extracting generic terms from digital gazetteer

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Exploiting named entity taggers in a second language

ACLstudent '05 Proceedings of the ACL Student Research Workshop
Learning named entity recognition in portuguese from spanish

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
On the need to bootstrap ontology learning with extraction grammar learning

ICCS'05 Proceedings of the 13th international conference on Conceptual Structures: common Semantics for Sharing Knowledge

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recognition of Proper Nouns (PNs) is considered an important task in the area of Information Retrieval and Extraction. However the high performance of most existing PN classifiers heavily depends upon the availability of large dictionaries of domain-specific Proper Nouns, and a certain amount of manual work for rule writing or manual tagging. Though it is not a heavy requirement to rely on some existing PN dictionary (often these resources are available on the web), its coverage of a domain corpus may be rather low, in absence of manual updating. In this paper we propose a technique for the automatic updating of an PN Dictionary through the cooperation of an inductive and a probabilistic classifier. In our experiments we show that, whenever an existing PN Dictionary allows the identification of 50% of the proper nouns within a corpus, our technique allows, without additional manual effort, the successful recognition of about 90% of the remaining 50%.