oreChem ChemXSeer: a semantic digital library for chemistry

Authors:
Na Li;Leilei Zhu;Prasenjit Mitra;Karl Mueller;Eric Poweleit;C. Lee Giles
Affiliations:
The Pennsylvania State University, University Park, PA, USA;The Pennsylvania State University, University Park, PA, USA;The Pennsylvania State University, University Park, PA, USA;The Pennsylvania State University, University Park, PA, USA;The Pennsylvania State University, University Park, PA, USA;The Pennsylvania State University, University Park, PA, USA
Venue:
Proceedings of the 10th annual joint conference on Digital libraries
Year:
2010

Citing 12
Cited 2

A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Semantic web applications to e-science in silico experiments

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Named graphs, provenance and trust

WWW '05 Proceedings of the 14th international conference on World Wide Web
Fedora: an architecture for complex objects and their relationships

International Journal on Digital Libraries
FRBR: enriching and integrating digital libraries

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Automatic extraction of table metadata from digital documents

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Mining, indexing, and searching for textual chemical molecule information on the web

Proceedings of the 17th international conference on World Wide Web
Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
NLP support for faceted navigation in scholarly collections

NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
SeerSuite: developing a scalable and reliable application framework for building digital libraries by crawling the web

WebApps'10 Proceedings of the 2010 USENIX conference on Web application development
JeromeDL – adding semantic web technologies to digital libraries

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications

SimDL: a model ontology driven digital library for simulation systems

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
IPKB: a digital library for invertebrate paleontology

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Representing the semantics of unstructured scientific publications will certainly facilitate access and search and hopefully lead to new discoveries. However, current digital libraries are usually limited to classic flat structured metadata even for scientific publications that potentially contain rich semantic metadata. In addition, how to search the scientific literature of linked semantic metadata is an open problem. We have developed a semantic digital library oreChem ChemxSeer that models chemistry papers with semantic metadata. It stores and indexes extracted metadata from a chemistry paper repository Chemx Seer using "compound objects". We use the Open Archives Initiative Object Reuse and Exchange (OAI-ORE) (http://www.openarchives.org/ore/ standard to define a compound object that aggregates metadata fields related to a digital object. Aggregated metadata can be managed and retrieved easily as one unit resulting in improved ease-of-use and has the potential to improve the semantic interpretation of shared data. We show how metadata can be extracted from documents and aggregated using OAI-ORE. ORE objects are created on demand; thus, we are able to search for a set of linked metadata with one query. We were also able to model new types of metadata easily. For example, chemists are especially interested in finding information related to experiments in documents. We show how paragraphs containing experiment information in chemistry papers can be extracted and tagged based on a chemistry ontology with 470 classes, and then represented in ORE along with other document-related metadata. Our algorithm uses a classifier with features that are words that are typically only used to describe experiments, such as "apparatus", "prepare", etc. Using a dataset comprised of documents from the Royal Society of Chemistry digital library, we show that the our proposed methodperforms well in extracting experiment-related paragraphs from chemistry documents.