A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora

Authors:
Jahiruddin;Muhammad Abulaish;Lipika Dey
Affiliations:
Department of Computer Science, Jamia Millia Islamia (A Central University), New Delhi, India;Department of Computer Science, Jamia Millia Islamia (A Central University), New Delhi, India;Innovation Labs, Tata Consultancy Services, New Delhi, India
Venue:
Journal of Biomedical Informatics
Year:
2010

Citing 16
Cited 0

Information extraction as a basis for high-precision text classification

ACM Transactions on Information Systems (TOIS)
Natural language understanding (2nd ed.)

Natural language understanding (2nd ed.)
Information extraction

Communications of the ACM
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Information Retrieval Meets Gene Analysis

IEEE Intelligent Systems
Circle Graphs: New Visualization Tools for Text-Mining

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
A text-mining system for knowledge discovery from biomedical documents

IBM Systems Journal
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text

Bioinformatics
RelEx---Relation extraction using dependency parse trees

Bioinformatics
Biological relation extraction and query answering from MEDLINE abstracts using ontology-based text mining

Data & Knowledge Engineering
Promoting Insight-Based Evaluation of Visualizations: From Contest to Benchmark Repository

IEEE Transactions on Visualization and Computer Graphics
Kernel-based learning for biomedical relation extraction

Journal of the American Society for Information Science and Technology
Exploiting Gene Ontology to Conceptualize Biomedical Document Collections

ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
A Concept-Driven Automatic Ontology Generation Approach for Conceptualization of Document Corpora

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Unsupervised learning of semantic relations between concepts of a molecular biology ontology

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
RelExt: a tool for relation extraction from text in ontology extension

ISWC'05 Proceedings of the 4th international conference on The Semantic Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

A number of techniques such as information extraction, document classification, document clustering and information visualization have been developed to ease extraction and understanding of information embedded within text documents. However, knowledge that is embedded in natural language texts is difficult to extract using simple pattern matching techniques and most of these methods do not help users directly understand key concepts and their semantic relationships in document corpora, which are critical for capturing their conceptual structures. The problem arises due to the fact that most of the information is embedded within unstructured or semi-structured texts that computers can not interpret very easily. In this paper, we have presented a novel Biomedical Knowledge Extraction and Visualization framework, BioKEVis to identify key information components from biomedical text documents. The information components are centered on key concepts. BioKEVis applies linguistic analysis and Latent Semantic Analysis (LSA) to identify key concepts. The information component extraction principle is based on natural language processing techniques and semantic-based analysis. The system is also integrated with a biomedical named entity recognizer, ABNER, to tag genes, proteins and other entity names in the text. We have also presented a method for collating information extracted from multiple sources to generate semantic network. The network provides distinct user perspectives and allows navigation over documents with similar information components and is also used to provide a comprehensive view of the collection. The system stores the extracted information components in a structured repository which is integrated with a query-processing module to handle biomedical queries over text documents. We have also proposed a document ranking mechanism to present retrieved documents in order of their relevance to the user query.