An algorithm for pronominal anaphora resolution
Computational Linguistics
Two biomedical sublanguages: a description based on the theories of Zellig Harris
Journal of Biomedical Informatics - Special issue: Sublanguage
The Journal of Machine Learning Research
How verb subcategorization frequencies are affected by corpus choice
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
A new statistical parser based on bigram lexical dependencies
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Head-Driven Statistical Models for Natural Language Parsing
Computational Linguistics
Linguistically motivated large-scale NLP with C&C and boxer
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Postnominal prepositional phrase attachment in proteomics
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
The choice of features for classification of verbs in biomedical texts
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Exploring domain differences for the design of pronoun resolution systems for biomedical text
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Comparing corpora using frequency profiling
CompareCorpora '00 Proceedings of the Workshop on Comparing Corpora
Domain adaptation for statistical classifiers
Journal of Artificial Intelligence Research
Porting a lexicalized-grammar parser to the biomedical domain
Journal of Biomedical Informatics
Adapting a probabilistic disambiguation model of an HPSG parser to a new domain
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Effective measures of domain similarity for parsing
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Hi-index | 0.01 |
Previous research has demonstrated the importance of handling differences between domains such as "newswire" and "biomedicine" when porting NLP systems from one domain to another. In this paper we identify the related issue of subdomain variation, i.e., differences between subsets of a domain that might be expected to behave homogeneously. Using a large corpus of research articles, we explore how subdomains of biomedicine vary across a variety of linguistic dimensions and discover that there is rich variation. We conclude that an awareness of such variation is necessary when deploying NLP systems for use in single or multiple subdomains.