The GENIA corpus: an annotated research abstract corpus in molecular biology domain

Authors:
Tomoko Ohta;Yuka Tateisi;Jin-Dong Kim
Affiliations:
University of Tokyo, Bunkyo-ku, Tokyo, Japan;CREST, JST, Bunkyo-ku, Tokyo, Japan;CREST, JST, Bunkyo-ku, Tokyo, Japan
Venue:
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Year:
2002

Citing 3
Cited 69

Extracting the names of genes and gene products with a hidden Markov model

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
XML tag information management system: a workbench for ontology-based knowledge acquisition and integration

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Building an annotated corpus in the molecular-biology domain

Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content

Probabilistic term variant generator for biomedical terms

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A shallow parser based on closed-class words to capture relations in biomedical text

Journal of Biomedical Informatics
Introduction: named entity recognition in biomedicine

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Enhancing HMM-based biomedical named entity recognition by studying special phenomena

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Comparison of character-level and part of speech features for name recognition in biomedical texts

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Improving the performance of dictionary-based approaches in protein name recognition

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Use of morphological analysis in protein name recognition

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Term identification in the biomedical literature

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
A robust retrieval engine for proximal and structural search

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
A Ranking model of proximal and structural text retrieval based on region algebra

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
A hybrid approach to protein name identification in biomedical texts

Information Processing and Management: an International Journal
Tuning support vector machines for biomedical named entity recognition

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Boosting precision and recall of dictionary-based protein name recognition

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Effective adaptation of a Hidden Markov Model-based named entity recognizer for biomedical domain

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Encoding biomedical resources in TEI: the case of the GENIA corpus

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Multi-criteria-based active learning for named entity recognition

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification

Journal of Biomedical Informatics
Vote-Based Classifier Selection for Biomedical NER Using Genetic Algorithms

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part II
Intra-document structural frequency features for semi-supervised domain adaptation

Proceedings of the 17th ACM conference on Information and knowledge management
Spanish Nested Named Entity Recognition Using a Syntax-Dependent Tree Traversal-Based Strategy

MICAI '08 Proceedings of the 7th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
Towards a SVM-struct Based Active Learning Algorithm for Least Cost Semantic Annotation

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
BioTop: An upper domain ontology for the life sciences: A description of its current structure, contents and interfaces to OBO ontologies

Applied Ontology - Towards a Metaontology for the Biomedical Domain
Connections between the lines: augmenting social networks with text

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised Learning of Semantic Relations for Molecular Biology Ontologies

Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
An Upper-Level Ontology for Chemistry

Proceedings of the 2008 conference on Formal Ontology in Information Systems: Proceedings of the Fifth International Conference (FOIS 2008)
From GENIA to BIOTOPTowards a Top-Level Ontology for Biology

Proceedings of the 2006 conference on Formal Ontology in Information Systems: Proceedings of the Fourth International Conference (FOIS 2006)
Recognizing names in biomedical texts using hidden Markov model and SVM plus sigmoid

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Improving the identification of non-anaphoric it using support vector machines

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Named entity recognition in biomedical texts using an HMM model

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Exploiting context for biomedical entity recognition: from syntax to the web

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Exploring deep knowledge resources in biomedical name recognition

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
POSBIOTM-NER in the shared task of BioNLP/NLPBA 2004

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Ontology-based natural language query processing for the biological domain

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Static relations: a piece in the biomedical information extraction puzzle

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Recognising nested named entities in biomedical text

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Reranking for biomedical named-entity recognition

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Using LDA to detect semantically incoherent documents

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Are morpho-syntactic features more predictive for the resolution of noun phrase coordination ambiguity than lexico-semantic similarity scores?

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Parallel entity and treebank annotation

CorpusAnno '05 Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky
Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Ambiguous part-of-speech tagging for improving accuracy and domain portability of syntactic parsers

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Corpus design for biomedical natural language processing

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Efficient annotation with the Jena ANnotation Environment (JANE)

LAW '07 Proceedings of the Linguistic Annotation Workshop
An annotation type system for a data-driven NLP pipeline

LAW '07 Proceedings of the Linguistic Annotation Workshop
Unsupervised learning of semantic relations between concepts of a molecular biology ontology

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Ontology-based natural language query processing for the biological domain

LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
Classifier subset selection for biomedical named entity recognition

Applied Intelligence
Nested named entity recognition

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Combining convolution kernels defined on heterogeneous sub-structures

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Extracting ontology concept hierarchies from text using Markov logic

Proceedings of the 2010 ACM Symposium on Applied Computing
Supervised noun phrase coreference research: the first fifteen years

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Scaling up biomedical event extraction to the entire PubMed

BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Web-based and combined language models: a case study on noun compound identification

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A methodology towards effective and efficient manual document annotation: addressing annotator discrepancy and annotation quality

EKAW'10 Proceedings of the 17th international conference on Knowledge engineering and management by the masses
Proposal for an extension of traditional named entities: from guidelines to evaluation, an overview

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Cross-Domain Effects on Parse Selection for Precision Grammars

Research on Language and Computation
Recognizing biomedical named entities using SVMs: improving recognition performance with a minimal set of features

KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
Developing a robust part-of-speech tagger for biomedical text

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Mining biomedical abstracts: what’s in a term?

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
SVM-Based biological named entity recognition using minimum edit-distance feature boosted by virtual examples

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Overview of the protein coreference task in BioNLP Shared Task 2011

BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
Overview of the entity relations (REL) supporting task of BioNLP Shared Task 2011

BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
Exploring predicate-argument relations for named entity recognition in the molecular biology domain

DS'05 Proceedings of the 8th international conference on Discovery Science
Incremental maintenance of biological databases using association rule mining

PRIB'06 Proceedings of the 2006 international conference on Pattern Recognition in Bioinformatics
Active learning technique for biomedical named entity extraction

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
A multi-strategy approach to biological named entity recognition

Expert Systems with Applications: An International Journal
BioTop: An upper domain ontology for the life sciences: A description of its current structure, contents and interfaces to OBO ontologies

Applied Ontology - Towards a Metaontology for the Biomedical Domain
Open-domain anatomical entity mention detection

ACL '12 Proceedings of the Workshop on Detecting Structure in Scholarly Discourse
A pilot investigation of information extraction in the semantic annotation of archaeological reports

International Journal of Metadata, Semantics and Ontologies

Quantified Score

Hi-index	0.01

Visualization

Abstract

With the information overload in genome-related field, there is an increasing need for natural language processing technology to extract information from literature and various attempts of information extraction using NLP has been being made. We are developing the necessary resources including domain ontology and annotated corpus from research abstracts in MEDLINE database (GENIA corpus). We are building the ontology and the corpus simultaneously, using each other. In this paper we report on our new corpus, its ontological basis, annotation scheme, and statistics of annotated objects. We also describe the tools used for corpus annotation and management.