Exposing the hidden web for chemical digital libraries

Authors:
Sascha Tönnies;Benjamin Köhncke;Oliver Koepler;Wolf-Tilo Balke
Affiliations:
L3S Research Center, Hannover, Germany;L3S Research Center, Hannover, Germany;TIB Hannover, Hannover, Germany;TU Braunschweig, Braunschweig, Germany
Venue:
Proceedings of the 10th annual joint conference on Digital libraries
Year:
2010

Citing 7
Cited 3

SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules

Journal of Chemical Information & Computer Sciences
An annotation scheme for discourse-level argumentation in research articles

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Query Chem: a Google-powered web search combining text and chemical structures

Bioinformatics
Extraction and search of chemical formulae in text documents on the web

Proceedings of the 16th international conference on World Wide Web
Mining, indexing, and searching for textual chemical molecule information on the web

Proceedings of the 17th international conference on World Wide Web
Semantic annotation of papers: interface & enrichment tool (SAPIENT)

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
High-Throughput identification of chemistry in life science texts

CompLife'06 Proceedings of the Second international conference on Computational Life Sciences

Using Wikipedia categories for compact representations of chemical documents

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Taking chemistry to the task: personalized queries for chemical digital libraries

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Catching the drift --- indexing implicit knowledge in chemical digital libraries

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, the vast amount of digitally available content has lead to the creation of many topic-centered digital libraries. Also in the domain of chemistry more and more digital collections are available, but the complex query formulation still hampers their intuitive adoption. This is because information seeking in chemical documents is focused on chemical entities, for which current standard search relies on complex structures which are hard to extract from documents. Moreover, although simple keyword searches would often be sufficient, current collections simply cannot be indexed by Web search providers due to the ambiguity of chemical substance names. In this paper we present a framework for automatically generating metadata-enriched index pages for all documents in a given chemical collection. All information is then linked to the respective documents and thus provides an easy to crawl metadata repository promising to open up digital chemical libraries. Our experiments, indexing an open access journal, show that not only the documents can be found using a simple Google search via the automatically created index pages, but also that the quality of the search is much more efficient than fulltext indexing in terms of both precision/recall and performance. Finally, we compare our indexing against a classical structure search and figured out that keyword-based search can indeed solve at least some of the daily tasks in chemical workflows. To use our framework thus promises to expose a large part of the currently still hidden chemical Web, making the techniques employed interesting for chemical information providers like digital libraries and open access journals.