A comparison of parsing technologies for the biomedical domain

Authors:
Claire Grover;Alex Lascarides;Mirella Lapata
Affiliations:
School of Informatics, The University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9LW, UK e-mail: C.Grover@ed.ac.uk, A.Lascarides@ed.ac.uk;School of Informatics, The University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9LW, UK e-mail: C.Grover@ed.ac.uk, A.Lascarides@ed.ac.uk;Department of Computer Science, University of Sheffield, 11 Portobello Street, Sheffield S1 4DP, UK e-mail: mlap@dcs.shef.ac.uk
Venue:
Natural Language Engineering
Year:
2005

Citing 24
Cited 10

The derivation of a large computational lexicon for English from LDOCE

Computational lexicography for natural language processing
C4.5: programs for machine learning

C4.5: programs for machine learning
Interpretation as abduction

Artificial Intelligence - Special volume on natural language processing
OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Selection and information: a class-based approach to lexical relationships

Selection and information: a class-based approach to lexical relationships
Similarity-Based Models of Word Cooccurrence Probabilities

Machine Learning - Special issue on natural language learning
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
The disambiguation of nominalizations

Computational Linguistics
Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
A Pragmatic Information Extraction Strategy for Gathering Data on Genetic Interactions

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Constructing Biological Knowledge Bases by Extracting Information from Text Sources

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Understanding noun compounds

Understanding noun compounds
Generalized probabilistic LR parsing of natural language (Corpora) with unification-based grammars

Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Automatic rule induction for unknown-word guessing

Computational Linguistics
Extracting molecular binding relationships from biomedical text

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Does Baum-Welch re-estimation help taggers?

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A computational analysis of complex noun phrases in Navy messages

ACL '84 Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics
Another look at nominal compounds

ACL '84 Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics
Algorithm for automatic interpretation of noun sequences

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
High precision extraction of grammatical relations

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
XML-based data preparation for robust deep parsing

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Evaluating smoothing algorithms against plausibility judgements

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Robust, applied morphological generation

INLG '00 Proceedings of the first international conference on Natural language generation - Volume 14

XML-based NLP tools for analysing and annotating medical language

NLPXML '02 Proceedings of the 2nd workshop on NLP and XML - Volume 17
Syntactic sentence compression in the biomedical domain: facilitating access to related articles

Information Retrieval
Interpretation of compound nominalisations using corpus and web statistics

MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Semantic labeling of compound nominalization in Chinese

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Porting a lexicalized-grammar parser to the biomedical domain

Journal of Biomedical Informatics
Learning relations from biomedical corpora using dependency trees

KDECB'06 Proceedings of the 1st international conference on Knowledge discovery and emergent complexity in bioinformatics
Identification of Chinese verb nominalization using support vector machine

MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
A robust linguistic platform for efficient and domain specific web content analysis

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Unsupervised relation extraction using dependency trees for automatic generation of multiple-choice questions

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Parsing biomedical literature

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports on a number of experiments which are designed to investigate the extent to which current NLP resources are able to syntactically and semantically analyse biomedical text. We address two tasks: (a) parsing a real corpus with a hand-built wide-coverage grammar, producing both syntactic analyses and logical forms and (b) automatically computing the interpretation of compound nouns where the head is a nominalisation (e.g. hospital arrival means an arrival at hospital, while patient arrival means an arrival of a patient). For the former task we demonstrate that flexible and yet constrained pre-processing techniques are crucial to success: these enable us to use part-of-speech tags to overcome inadequate lexical coverage, and to package up complex technical expressions prior to parsing so that they are blocked from creating misleading amounts of syntactic complexity. We argue that the XML-processing paradigm is ideally suited for automatically preparing the corpus for parsing. For the latter task, we compute interpretations of the compounds by exploiting surface cues and meaning paraphrases, which in turn are extracted from the parsed corpus. This provides an empirical setting in which we can compare the utility of a comparatively deep parser vs. a shallow one, exploring the trade-off between resolving attachment ambiguities on the one hand and generating errors in the parses on the other. We demonstrate that a model of the meaning of compound nominalisations is achievable with the aid of current broad-coverage parsers.