Classification from full text: a comparison of canonical sections of scientific papers
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Hi-index | 0.00 |
Named entity recognition of gene names, protein names, cell-lines, and other biologically relevant concepts has received significant attention by the research community. In this work, we considered named entity recognition of experimental techniques in biomedical articles. In our system to mine gene and disease associations, each association is categorized by the techniques used to derive the association. Categories are used to weight or remove associations, such as removing associations derived from microarray experiments. We report on a pilot study to identify experimental techniques. Three main activities are discussed: manual annotation, lexicon-based tagging, and document classification. Analysis of manual annotation suggests several interesting linguistic characteristics arise. Two lexicon-based tagging approaches demonstrate little agreement, suggesting sophisticated tagging algorithms may be necessary. Document classification using abstracts and titles is compared with full-text classification. In most cases, abstracts and titles show comparable performance to full-text.