Identifying references to datasets in publications

Authors:
Katarina Boland;Dominique Ritze;Kai Eckert;Brigitte Mathiak
Affiliations:
GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany;Mannheim University Library, Mannheim, Germany;Mannheim University Library, Mannheim, Germany;GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
Venue:
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Year:
2012

Citing 6
Cited 0

Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Finding parts in very large corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A bootstrapping method for learning semantic lexicons using extraction pattern contexts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Investigation of unsupervised pattern learning techniques for bootstrap construction of a medical treatment lexicon

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
Thesaurus extension using web search engines

ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Research data and publications are usually stored in separate and structurally distinct information systems. Often, links between these resources are not explicitly available which complicates the search for previous research. In this paper, we propose a pattern induction method for the detection of study references in full texts. Since these references are not specified in a standardized way and may occur inside a variety of different contexts --- i.e., captions, footnotes, or continuous text --- our algorithm is required to induce very flexible patterns. To overcome the sparse distribution of training instances, we induce patterns iteratively using a bootstrapping approach. We show that our method achieves promising results for the automatic identification of data references and is a first step towards building an integrated information system.