Information Retrieval
Two biomedical sublanguages: a description based on the theories of Zellig Harris
Journal of Biomedical Informatics - Special issue: Sublanguage
Head-driven statistical models for natural language parsing
Head-driven statistical models for natural language parsing
Exploiting diversity for natural language parsing
Exploiting diversity for natural language parsing
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
A test of the leaf-ancestor metric for parse accuracy
Natural Language Engineering
MedPost: a part-of-speech tagger for bioMedical text
Bioinformatics
Design of a multi-lingual, parallel-processing statistical parsing engine
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Reranking and self-training for parser adaptation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Self-training for biomedical parsing
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Porting a lexicalized-grammar parser to the biomedical domain
Journal of Biomedical Informatics
Evaluating the impact of alternative dependency graph encodings on solving event extraction tasks
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Better Arabic parsing: baselines, evaluations, and analysis
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Parsing natural language queries for life science knowledge
BioNLP '11 Proceedings of BioNLP 2011 Workshop
Cross-Domain Effects on Parse Selection for Precision Grammars
Research on Language and Computation
GeneTUC, GENIA and google: natural language understanding in molecular biology literature
Transactions on Computational Systems Biology V
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
It is not clear a priori how well parsers trained on the Penn Treebank will parse significantly different corpora without retraining. We carried out a competitive evaluation of three leading treebank parsers on an annotated corpus from the human molecular biology domain, and on an extract from the Penn Treebank for comparison, performing a detailed analysis of the kinds of errors each parser made, along with a quantitative comparison of syntax usage between the two corpora. Our results suggest that these tools are becoming somewhat over-specialised on their training domain at the expense of portability, but also indicate that some of the errors encountered are of doubtful importance for information extraction tasks. Furthermore, our inital experiments with unsupervised parse combination techniques showed that integrating the output of several parsers can ameliorate some of the performance problems they encounter on unfamiliar text, providing accuracy and coverage improvements, and a novel measure of trustworthiness. Supplementary materials are available at http://textmining.cryst.bbk.ac.uk/ac105/.