The derivation of a large computational lexicon for English from LDOCE
Computational lexicography for natural language processing
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Artificial Intelligence - Special volume on empirical methods
Generalized probabilistic LR parsing of natural language (Corpora) with unification-based grammars
Computational Linguistics - Special issue on using large corpora: I
Automatic rule induction for unknown-word guessing
Computational Linguistics
Does Baum-Welch re-estimation help taggers?
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Charting the depths of robust speech parsing
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
The Penn Treebank: annotating predicate argument structure
HLT '94 Proceedings of the workshop on Human Language Technology
Robust, applied morphological generation
INLG '00 Proceedings of the first international conference on Natural language generation - Volume 14
Applied morphological processing of English
Natural Language Engineering
A comparison of parsing technologies for the biomedical domain
Natural Language Engineering
Integrating shallow linguistic processing into a unification: based Spanish grammar
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
An integrated architecture for shallow and deep processing
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Integrated shallow and deep parsing: TopP meets HPSG
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
XML-based NLP tools for analysing and annotating medical language
NLPXML '02 Proceedings of the 2nd workshop on NLP and XML - Volume 17
A geo-coding service encompassing a geo-parsing tool and integrated digital gazetteer service
HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
Hi-index | 0.00 |
We describe the use of XML tokenisation, tagging and mark-up tools to prepare a corpus for parsing. Our techniques are generally applicable but here we focus on parsing Medline abstracts with the ANLT wide-coverage grammar. Hand-crafted grammars inevitably lack coverage but many coverage failures are due to inadequacies of their lexicons. We describe a method of gaining a degree of robustness by interfacing POS tag information with the existing lexicon. We also show that XML tools provide a sophisticated approach to pre-processing, helping to ameliorate the 'messiness' in real language data and improve parse performance.