OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Constructing Biological Knowledge Bases by Extracting Information from Text Sources
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Gene name identification and normalization using a model organism database
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
WordFreak: an open tool for linguistic annotation
NAACL-Demonstrations '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Demonstrations - Volume 4
Tagging gene and protein names in full text articles
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Medstract: creating large-scale information servers for biomedical libraries
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
The GENIA corpus: an annotated research abstract corpus in molecular biology domain
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Introduction to the bio-entity recognition task at JNLPBA
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
MedTag: a collection of biomedical annotations
ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Improving noun phrase coreference resolution by matching strings
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Active learning for anaphora resolution
HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
A priority model for named entities
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Postnominal prepositional phrase attachment in proteomics
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Recognising nested named entities in biomedical text
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Statistical anaphora resolution in biomedical texts
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Semi-supervised Prediction of Protein Interaction Sentences Exploiting Semantically Encoded Metrics
PRIB '09 Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics
A priority model for named entities
LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
Postnominal prepositional phrase attachment in proteomics
LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
Semi-parametric analysis of multi-rater data
Statistics and Computing
Towards morphologically annotated corpus of hospital discharge reports in Polish
BioNLP '11 Proceedings of BioNLP 2011 Workshop
A scaleable automated quality assurance technique for semantic representations and proposition banks
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Automatic semantic labeling of medical texts with feature structures
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Datasets for generic relation extraction*
Natural Language Engineering
Boosting the protein name recognition performance by bootstrapping on selected text
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Hi-index | 0.00 |
This paper classifies six publicly available biomedical corpora according to various corpus design features and characteristics. We then present usage data for the six corpora. We show that corpora that are carefully annotated with respect to structural and linguistic characteristics and that are distributed in standard formats are more widely used than corpora that are not. These findings have implications for the design of the next generation of biomedical corpora.