User-driven development of text mining resources for cancer risk assessment

Authors:
Lin Sun;Anna Korhonen;Ilona Silins;Ulla Stenius
Affiliations:
University of Cambridge, Cambridge, UK;University of Cambridge, Cambridge, UK;Institute of Environmental Medicine, Stockholm, Sweden;Institute of Environmental Medicine, Stockholm, Sweden
Venue:
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Year:
2009

Citing 6
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Substring selection for biomedical document classification

Bioinformatics
Annotation of chemical named entities

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Developing a robust part-of-speech tagger for biomedical text

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Multinomial naive bayes for text categorization revisited

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the most neglected areas of biomedical Text Mining (TM) is the development of systems based on carefully assessed user needs. We investigate the needs of an important task yet to be tackled by TM --- Cancer Risk Assessment (CRA) --- and take the first step towards the development of TM for the task: identifying and organizing the scientific evidence required for CRA in a taxonomy. The taxonomy is based on expert annotation of 1297 MEDLINE abstracts. We report promising results with inter-annotator agreement tests and automatic classification experiments, and a user test which demonstrates that the resources we have built are well-defined, accurate, and applicable to a real-world CRA scenario. We discuss extending and refining the taxonomy further via manual and machine learning approaches, and the subsequent steps required to develop TM for the needs of CRA.