A methodology for semantic integration of metadata in bioinformatics data sources

Authors:
Lei Li;Roop G. Singh;Guangzhi Zheng;Art Vandenberg;Vijay Vaishnavi;Sham Navathe
Affiliations:
Georgia State University, Atlanta, GA;Georgia State University, Atlanta, GA;Georgia State University, Atlanta, GA;Georgia State University, Atlanta, GA;Georgia State University, Atlanta, GA;Georgia Institute of Technology, Atlanta, Georgia
Venue:
Proceedings of the 43rd annual Southeast regional conference - Volume 1
Year:
2005

Citing 15
Cited 0

No Silver Bullet Essence and Accidents of Software Engineering

Computer
A comparative analysis of methodologies for database schema integration

ACM Computing Surveys (CSUR)
Self-organizing maps

Self-organizing maps
Interactive lens visualization techniques

VIS '99 Proceedings of the conference on Visualization '99: celebrating ten years
Data clustering: a review

ACM Computing Surveys (CSUR)
Binary trading relations and the limits of EDI standards: the Procrustean bed of standards

European Journal of Information Systems
An Architecture to Support Communities of Interest Using Directory Services Capabilities

HICSS '03 Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 9 - Volume 9
Information foraging through clustering and summarization: a self-organizing approach

Information foraging through clustering and summarization: a self-organizing approach
Data integration in a bandwidth-rich world

Communications of the ACM - Blueprint for the future of high-performance networking
Data-intensive e-science frontier research

Communications of the ACM - Blueprint for the future of high-performance networking
Comparison of Two Schemes for Automatic Keyword Extraction from MEDLINE for Functional Gene Clustering

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Automatic composite wrapper generation for semi-structured biological data based on table structure identification

ACM SIGMOD Record
Integration of biological sources: current systems and challenges ahead

ACM SIGMOD Record
Universal Enterprise Integration: Challenges of and Approaches to Web-Enabled Virtual Organizations

Information Technology and Management
Clustering of LDAP directory schemas to facilitate information resources interoperability across organizations

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semantic heterogeneity is becoming increasingly prominent in bioinformatics domains that deal with constantly expanding, dynamic, often very large, datasets from various distributed sources. Metadata is the key component for effective information integration. Traditional approaches for reconciling semantic heterogeneity use standards or mediation-based methods. These approaches have had limited success in addressing the general semantic heterogeneity problem and by themselves are not likely to succeed in bioinformatics domains where one faces the additional complexity of keeping pace with the speed at which data and semantic heterogeneity is being generated. This paper presents a methodology for reconciliation of semantic heterogeneity of metadata in bioinformatics data sources. The approach is based on the proposition that by globally monitoring, clustering, and visualizing bioinformatics metadata across disparately created data sources, patterns of practice can be identified. This can facilitate semantic reconciliation of metadata in current data and mitigate semantic heterogeneity in future data by promoting sharing and reuse of existing metadata. To instantiate the methodology, a research architecture, MicroSEEDS, is presented and its implementation and envisioned uses are discussed.