Digital preservation and knowledge discovery based on documents from an international health science program

Authors:
Dharitri Misra;Robert H. Hall;Susan M. Payne;George R. Thoma
Affiliations:
National Institutes of Health, Bethesda, MD, USA;National Institutes of Health, Bethesda, MD, USA;National Institutes of Health, Bethesda, MD, USA;National Institutes of Health, Bethesda, MD, USA
Venue:
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Year:
2012

Citing 3
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition
Support-Vector Networks

Machine Learning
Investigator name recognition from medical journal articles: a comparative study of SVM and structural SVM

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Important biomedical information is often recorded, published or archived in unstructured and semi-structured textual form. Artificial intelligence and knowledge discovery techniques may be applied to large volumes of such data to identify and extract useful metadata, not only for providing access to these documents, but also for conducting analyses and uncovering patterns and trends in a field. The System for Preservation of Electronic Resources (SPER), an information management tool developed at the U.S. National Library of Medicine, provides these capabilities by integrating machine learning, data mining and digital preservation techniques. In this paper, we present an overview of SPER and its ability to retrieve information from one such dataset. We show how SPER was applied to the semi-structured records of an international health science program, the 46-year continuous archive of conference publications and related documents from the Joint Cholera Panel of the U.S.-Japan Cooperative Medical Science Program (CMSP). We explain the techniques by which metadata was extracted automatically from the semi-structured document contents to preserve these publications, and show how such data was used to quantitatively describe the activity of a research community toward a preliminary study of a subset of its specific health science program goals.