Framework for Efficient Indexing and Searching of Scientific Metadata

Authors:
Chaitali Gupta;Madhusudhan Govindaraju
Affiliations:
-;-
Venue:
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Year:
2010

Citing 7
Cited 0

WordNet: a lexical database for English

Communications of the ACM
Accelerating XPath location steps

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Index Structures for Path Expressions

ICDT '99 Proceedings of the 7th International Conference on Database Theory
D(k)-index: an adaptive structural summary for graph-structured data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
PRIX: Indexing And Querying XML Using Prüfer Sequences

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Efficient processing of XML path queries using the disk-based F&B Index

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Benchmarking XML processors for applications in grid web services

Proceedings of the 2006 ACM/IEEE conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A seamless and intuitive data reduction capability for the vast amount of scientific metadata generated by experiments is critical to ensure effective use of the data by domain specific scientists. The portal environments and scientific gateways currently used by scientists provide search capability that is limited to the pre-defined pull-down menus and conditions set in the portal interface. Currently, data reduction can only be effectively achieved by scientists who have developed expertise in dealing with complex and disparate query languages. A common theme in our discussions with scientists is that data reduction capability, similar to web search in terms of ease-of-use, scalability, and freshness/accuracy of results, is a critical need that can greatly enhance the productivity and quality of scientific research. Most existing search tools are designed for exact string matching, but such matches are highly unlikely given the nature of metadata produced by instruments and a user’s inability to recall exact numbers to search in very large datasets. This paper presents research to locate metadata of interest within a range of values. To meet this goal, we leverage the use of XML in metadata description for scientific datasets, specifically the NeXus datasets generated by the SNS scientists. We have designed a scalable indexing structure for processing data reduction queries. Web semantics and ontology based methodologies are also employed to provide an elegant, intuitive, and powerful free-form query based data reduction interface to end users.