Sentence role identification in medline abstracts: training classifier with structured abstracts

Authors:
Masashi Shimbo;Takahiro Yamasaki;Yuji Matsumoto
Affiliations:
Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan;Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan;Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan
Venue:
AM'03 Proceedings of the Second international conference on Active Mining
Year:
2003

Citing 7
Cited 1

Support-Vector Networks

Machine Learning
Digital Libraries and Autonomous Citation Indexing

Computer
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1

Active mining project: overview

AM'03 Proceedings of the Second international conference on Active Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The abstract of a scientific paper typically consists of sentences describing the background of study, its objective, experimental method and results, and conclusions. We discuss the task of identifying which of these “structural roles” each sentence in abstracts plays, with a particular focus on its application in building a literature retrieval system. By annotating sentences in an abstract collection with role labels, we can build a literature retrieval system in which users can specify the roles of the sentences in which query terms should be sought. We argue that this facility enables more goal-oriented search, and also makes it easier to narrow down search results when adding extra query terms does not work. To build such a system, two issues need to be addressed: (1) how we should determine the set of structural roles presented to users from which they can choose the target search area, and (2) how we should classify each sentence in abstracts by their structural roles, without relying too much on human supervision. We view the task of role identification as that of text classification based on supervised machine learning. Our approach is characterized by the use of structured abstracts for building training data. In structured abstracts, which is a format of abstracts popular in biomedical domains, sections are explicitly marked with headings indicating their structural roles, and hence they provide us with an inexpensive way to collect training data for sentence classifiers. Statistics on the structured abstracts contained in Medline give an insight on determining the set of sections to be presented to users as well.