Handling Conjunctions in Named Entities

Authors:
Robert Dale;Paweł Mazur
Affiliations:
Centre for Language Technology, Macquarie University, NSW 2109, Sydney, Australia;Centre for Language Technology, Macquarie University, NSW 2109, Sydney, Australia and Institute of Applied Informatics, Wrocław University of Technology, Wyb. Wyspiańskiego 27, 50-370 Wr ...
Venue:
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2009

Citing 14
Cited 1

Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Neural networks: a systematic introduction

Neural networks: a systematic introduction
Internal and external evidence in the identification and semantic categorization of proper names

Corpus processing for lexical acquisition
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Logistic Model Trees

Machine Learning
Design of the MUC-6 evaluation

MUC6 '95 Proceedings of the 6th conference on Message understanding
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Flexible text segmentation with structured multilabel classification

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Key element summarisation: extracting information from company announcements

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence

Identifying non-elliptical entity mentions in a coordinated NP with ellipses

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although the literature contains reports of very high accuracy figures for the recognition of named entities in text, there are still some named entity phenomena that remain problematic for existing text processing systems. One of these is the ambiguity of conjunctions in candidate named entity strings, an all-too-prevalent problem in corporate and legal documents. In this paper, we distinguish four uses of the conjunction in these strings, and explore the use of a supervised machine learning approach to conjunction disambiguation trained on a very limited set of `name internal' features that avoids the need for expensive lexical or semantic resources. We achieve 84% correctly classified examples using k-fold evaluation on a data set of 600 instances. Further improvements are likely to require the use of wider domain knowledge and name external features.