What's in a name?: entity type variation across two biomedical subdomains

Authors:
Claudiu Mihăilă;Riza Theresa Batista-Navarro
Affiliations:
University of Manchester, Manchester, UK;University of Manchester, Manchester, UK
Venue:
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Year:
2012

Citing 7
Cited 0

Medical Language Processing: Computer Management of Narrative Data

Medical Language Processing: Computer Management of Narrative Data
Two biomedical sublanguages: a description based on the theories of Zellig Harris

Journal of Biomedical Informatics - Special issue: Sublanguage
The Proposition Bank: An Annotated Corpus of Semantic Roles

Computational Linguistics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Exploring domain differences for the design of pronoun resolution systems for biomedical text

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Overview of Genia event task in BioNLP Shared Task 2011

BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are lexical, syntactic, semantic and discourse variations amongst the languages used in various biomedical subdomains. It is important to recognise such differences and understand that biomedical tools that work well on some subdomains may not work as well on others. We report here on the semantic variations that occur in the sublanguages of two biomedical subdomains, i.e. cell biology and pharmacology, at the level of named entity information. By building a classifier using ratios of named entities as features, we show that named entity information can discriminate between documents from each subdomain. More specifically, our classifier can distinguish between documents belonging to each subdomain with an accuracy of 91.1% F-score.