Parsing biomedical literature

Authors:
Matthew Lease;Eugene Charniak
Affiliations:
Brown Laboratory for Linguistic Information Processing (BLLIP), Brown University, Providence, RI;Brown Laboratory for Linguistic Information Processing (BLLIP), Brown University, Providence, RI
Venue:
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Year:
2005

Citing 16
Cited 36

Learning to Parse Natural Language with Maximum Entropy Models

Machine Learning - Special issue on natural language learning
Using Combinatory Categorial Grammar to Extract Biomedical Information

IEEE Intelligent Systems
Discriminative Reranking for Natural Language Parsing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Implementation of the SMART Information Retrieval System

Implementation of the SMART Information Retrieval System
Parsing inside-out

Parsing inside-out
Learning probabilistic lexicalized grammars for natural language processing

Learning probabilistic lexicalized grammars for natural language processing
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A comparison of parsing technologies for the biomedical domain

Natural Language Engineering
Supervised and unsupervised PCFG adaptation to novel domains

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Example selection for bootstrapping statistical parsers

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Using predicate-argument structures for information extraction

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Extracting human protein interactions from MEDLINE using a full-sentence parser

Bioinformatics
Exploring deep knowledge resources in biomedical name recognition

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Statistical parsing with a context-free grammar and word statistics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Corpus-Oriented grammar development for acquiring a head-driven phrase structure grammar from the penn treebank

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Reranking and self-training for parser adaptation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Automatic classification of verbs in biomedical texts

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Partial training for a lexicalized-grammar parser

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Syntactic sentence compression in the biomedical domain: facilitating access to related articles

Information Retrieval
Kernel-based learning for biomedical relation extraction

Journal of the American Society for Information Science and Technology
Self-training for biomedical parsing

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Subdomain adaptation of a POS tagger with a small corpus

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
A graph kernel for protein-protein interaction extraction

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
On the unification of syntactic annotations under the stanford dependency scheme: a case study on BioInfer and GENIA

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Adaptation of POS tagging for multiple BioMedical domains

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Sample selection for statistical parsers: cognitively driven algorithms and evaluation measures

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
The Stanford typed dependencies representation

CrossParser '08 Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Adapting a lexicalized-grammar parser to contrasting domains

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Evaluating impact of re-training a lexical disambiguation model on domain adaptation of an HPSG parser

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Evaluating the effects of treebank size in a practical application for parsing

SETQA-NLP '08 Software Engineering, Testing, and Quality Assurance for Natural Language Processing
Ambiguous part-of-speech tagging for improving accuracy and domain portability of syntactic parsers

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
The value of parsing as feature generation for gene mention recognition

Journal of Biomedical Informatics
Collaborative text-annotation resource for disease-centered relation extraction from biomedical text

Journal of Biomedical Informatics
Porting a lexicalized-grammar parser to the biomedical domain

Journal of Biomedical Informatics
Subdomain adaptation of a POS tagger with a small corpus

LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
Learning relations from biomedical corpora using dependency trees

KDECB'06 Proceedings of the 1st international conference on Knowledge discovery and emergent complexity in bioinformatics
Domain adaptation for conditional random fields

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Using local alignments for relation recognition

Journal of Artificial Intelligence Research
A multi-domain web-based algorithm for POS tagging of unknown words

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Desiderata for ontologies to be used in semantic annotation of biomedical documents

Journal of Biomedical Informatics
Cross-Domain Effects on Parse Selection for Precision Grammars

Research on Language and Computation
Neighborhood hash graph kernel for protein-protein interaction extraction

Journal of Biomedical Informatics
Data mining from a patient safety database: the lessons learned

Data Mining and Knowledge Discovery
Legal language and legal knowledge management applications

Semantic Processing of Legal Texts
GeneTUC, GENIA and google: natural language understanding in molecular biology literature

Transactions on Computational Systems Biology V
A word clustering approach to domain adaptation: effective parsing of biomedical texts

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Hash Subgraph Pairwise Kernel for Protein-Protein Interaction Extraction

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Deriving a probabilistic syntacto-semantic grammar for biomedicine based on domain-specific terminologies

Journal of Biomedical Informatics
Improved parsing and POS tagging using inter-sentence consistency constraints

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Special Report: NCBI disease corpus: A resource for disease name recognition and concept normalization

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a preliminary study of several parser adaptation techniques evaluated on the GENIA corpus of MEDLINE abstracts [1,2]. We begin by observing that the Penn Treebank (PTB) is lexically impoverished when measured on various genres of scientific and technical writing, and that this significantly impacts parse accuracy. To resolve this without requiring in-domain treebank data, we show how existing domain-specific lexical resources may be leveraged to augment PTB-training: part-of-speech tags, dictionary collocations, and named-entities. Using a state-of-the-art statistical parser [3] as our baseline, our lexically-adapted parser achieves a 14.2% reduction in error. With oracle-knowledge of named-entities, this error reduction improves to 21.2%.