Distribution of information in biomedical abstracts and full-text publications

Authors:
M. J. Schuemie;M. Weeber;B. J. A. Schijvenaars;E. M. Van Mulligen;C. C. Van Der Eijk;R. Jelier;B. Mons;J. A. Kors
Affiliations:
Department of Medical Informatics, Erasmus University Medical Center Rotterdam, P.O. Box 1738, 3000 DR, Rotterdam, The Netherlands;Department of Medical Informatics, Erasmus University Medical Center Rotterdam, P.O. Box 1738, 3000 DR, Rotterdam, The Netherlands;Department of Medical Informatics, Erasmus University Medical Center Rotterdam, P.O. Box 1738, 3000 DR, Rotterdam, The Netherlands;Department of Medical Informatics, Erasmus University Medical Center Rotterdam, P.O. Box 1738, 3000 DR, Rotterdam, The Netherlands;Department of Medical Informatics, Erasmus University Medical Center Rotterdam, P.O. Box 1738, 3000 DR, Rotterdam, The Netherlands;Department of Medical Informatics, Erasmus University Medical Center Rotterdam, P.O. Box 1738, 3000 DR, Rotterdam, The Netherlands;Department of Medical Informatics, Erasmus University Medical Center Rotterdam, P.O. Box 1738, 3000 DR, Rotterdam, The Netherlands;Department of Medical Informatics, Erasmus University Medical Center Rotterdam, P.O. Box 1738, 3000 DR, Rotterdam, The Netherlands
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 12

Discovering implicit associations among critical biological entities

International Journal of Data Mining and Bioinformatics
Using argumentation to retrieve articles with similar citations from MEDLINE

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Combining multiple evidence for gene symbol disambiguation

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Challenges for extracting biomedical knowledge from full text

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Corpus design for biomedical natural language processing

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Expanded information retrieval using full-text searching

Journal of Information Science
Multistage Gene Normalization and SVM-Based Ranking for Protein Interactor Extraction in Full-Text Articles

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Building a coreference-annotated corpus from the domain of biochemistry

BioNLP '11 Proceedings of BioNLP 2011 Workshop
GetItFull – a tool for downloading and pre-processing full-text journal articles

KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
Latent argumentative pruning for compact MEDLINE indexing

AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine
Abstracts versus full texts and patents: a quantitative analysis of biomedical entities

IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
Localised topic information extraction for summarisation using syntactic sequences

International Journal of Knowledge and Web Intelligence

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Full-text documents potentially hold more information than their abstracts, but require more resources for processing. We investigated the added value of full text over abstracts in terms of information content and occurrences of gene symbol---gene name combinations that can resolve gene-symbol ambiguity. Results: We analyzed a set of 3902 biomedical full-text articles. Different keyword measures indicate that information density is highest in abstracts, but that the information coverage in full texts is much greater than in abstracts. Analysis of five different standard sections of articles shows that the highest information coverage is located in the results section. Still, 30--40% of the information mentioned in each section is unique to that section. Only 30% of the gene symbols in the abstract are accompanied by their corresponding names, and a further 8% of the gene names are found in the full text. In the full text, only 18% of the gene symbols are accompanied by their gene names.