A distributional semantics approach to simultaneous recognition of multiple classes of named entities

Authors:
Siddhartha Jonnalagadda;Robert Leaman;Trevor Cohen;Graciela Gonzalez
Affiliations:
Arizona State University;Arizona State University;The University of Texas Health Science Center at Houston;Arizona State University
Venue:
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2010

Citing 18
Cited 0

Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
A stop list for general text

ACM SIGIR Forum
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The structure of science information

Journal of Biomedical Informatics - Special issue: Sublanguage
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Recognizing names in biomedical texts: a machine learning approach

Bioinformatics
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text

Bioinformatics
Inducing syntactic categories by context distribution clustering

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Effective adaptation of a Hidden Markov Model-based named entity recognizer for biomedical domain

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Nested Named Entity Recognition in Historical Archive Text

ICSC '07 Proceedings of the International Conference on Semantic Computing
Methodological Review: Empirical distributional semantics: Methods and biomedical applications

Journal of Biomedical Informatics
Semantic Vector Combinations and the Synoptic Gospels

QI '09 Proceedings of the 3rd International Symposium on Quantum Interaction
POSBIOTM-NER in the shared task of BioNLP/NLPBA 2004

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Recognizing nested named entities in GENIA corpus

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Joint parsing and named entity recognition

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
SemEval-2007 task 09: multilevel semantic annotation of Catalan and Spanish

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Nested named entity recognition

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named Entity Recognition and Classification is being studied for last two decades. Since semantic features take huge amount of training time and are slow in inference, the existing tools apply features and rules mainly at the word level or use lexicons. Recent advances in distributional semantics allow us to efficiently create paradigmatic models that encode word order. We used Sahlgren et al's permutation-based variant of the Random Indexing model to create a scalable and efficient system to simultaneously recognize multiple entity classes mentioned in natural language, which is validated on the GENIA corpus which has annotations for 46 biomedical entity classes and supports nested entities. Using distributional semantics features only, it achieves an overall micro-averaged F-measure of 67.3% based on fragment matching with performance ranging from 7.4% for “DNA substructure” to 80.7% for “Bioentity”.