Identifying non-elliptical entity mentions in a coordinated NP with ellipses

Authors:
Jeongmin Chae;Younghee Jung;Taemin Lee;Soonyoung Jung;Chan Huh;Gilhan Kim;Hyeoncheol Kim;Heungbum Oh
Affiliations:
-;-;-;-;-;-;-;-
Venue:
Journal of Biomedical Informatics
Year:
2014

Citing 10
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Symmetric pattern matching analysis for English coordinate structures

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A simple but useful approach to conjunct identification

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
An unsupervised model for statistically determining coordinate phrase attachment

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Handling Conjunctions in Named Entities

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Parallel entity and treebank annotation

CorpusAnno '05 Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky
Improving syntactic coordination resolution using language modeling

HLT-SRWS '10 Proceedings of the NAACL HLT 2010 Student Research Workshop
Coordination resolution in biomedical texts

Coordination resolution in biomedical texts
Annotating coordination in the Penn treebank

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named entities in the biomedical domain are often written using a Noun Phrase (NP) along with a coordinating conjunction such as 'and' and 'or'. In addition, repeated words among named entity mentions are frequently omitted. It is often difficult to identify named entities. Although various Named Entity Recognition (NER) methods have tried to solve this problem, these methods can only deal with relatively simple elliptical patterns in coordinated NPs. We propose a new NER method for identifying non-elliptical entity mentions with simple or complex ellipses using linguistic rules and an entity mention dictionary. The GENIA and CRAFT corpora were used to evaluate the performance of the proposed system. The GENIA corpus was used to evaluate the performance of the system according to the quality of the dictionary. The GENIA corpus comprises 3434 non-elliptical entity mentions in 1585 coordinated NPs with ellipses. The system achieves 92.11% precision, 95.20% recall, and 93.63% F-score in identification of non-elliptical entity mentions in coordinated NPs. The accuracy of the system in resolving simple and complex ellipses is 94.54% and 91.95%, respectively. The CRAFT corpus was used to evaluate the performance of the system under realistic conditions. The system achieved 78.47% precision, 67.10% recall, and 72.34% F-score in coordinated NPs. The performance evaluations of the system show that it efficiently solves the problem caused by ellipses, and improves NER performance. The algorithm is implemented in PHP and the code can be downloaded from https://code.google.com/p/medtextmining/.