Mining semantically related terms from biomedical literature

Authors:
Goran Nenadić;Sophia Ananiadou
Affiliations:
University of Manchester and National Centre for Text Mining, Manchester, UK;University of Manchester and National Centre for Text Mining, Manchester, UK
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2006

Citing 19
Cited 9

An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
Better Rules, Few Features: A Semantic Approach to Selecting Features from Text

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Automatic Annotation for Biological Sequences by Etraction of Keywords from MEDLINE Abstracts: Development of a Prototype System

Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology
Constructing Biological Knowledge Bases by Extracting Information from Text Sources

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Supervised Learning of Term Similarities

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
Terminology-driven mining of biomedical literature

Proceedings of the 2003 ACM symposium on Applied computing
A pathway editor for literature-based knowledge curation

APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
Term extraction + term clustering: an integrated platform for computer-aided terminology

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
A methodology for automatic term recognition

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Term identification in the biomedical literature

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Tuning support vector machines for biomedical named entity recognition

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Using domain-specific verbs for term classification

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Selecting text features for gene name classification: from documents to terms

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Enhancing automatic term recognition through recognition of variation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Identification of related gene/protein names based on an HMM of name variations

Computational Biology and Chemistry

Overview and semantic issues of text mining

ACM SIGMOD Record
A proposal for chemical information retrieval evaluation

Proceedings of the 1st ACM workshop on Patent information retrieval
An unsupervised method for extracting domain-specific affixes in biological literature

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Annotation and disambiguation of semantic types in biomedical text: a cascaded approach to named entity recognition

NLPXML '06 Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing
Two-phase prediction of protein functions from biological literature based on Gini-Index

Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
Enrichment and structuring of archival description metadata

LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Lexical profiling of existing web directories to support fine-grained topic-focused web crawling

IRSG'08 Proceedings of the 2008 BCS-IRSG conference on Corpus Profiling
Learning a Lightweight Ontology for Semantic Retrieval in Patient-Centered Information Systems

International Journal of Knowledge Management
Semantics Discovery via Human Computation Games

International Journal on Semantic Web & Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discovering links and relationships is one of the main challenges in biomedical research, as scientists are interested in uncovering entities that have similar functions, take part in the same processes, or are coregulated. This article discusses the extraction of such semantically related entities (represented by domain terms) from biomedical literature. The method combines various text-based aspects, such as lexical, syntactic, and contextual similarities between terms. Lexical similarities are based on the level of sharing of word constituents. Syntactic similarities rely on expressions (such as term enumerations and conjunctions) in which a sequence of terms appears as a single syntactic unit. Finally, contextual similarities are based on automatic discovery of relevant contexts shared among terms. The approach is evaluated using the Genia resources, and the results of experiments are presented. Lexical and syntactic links have shown high precision and low recall, while contextual similarities have resulted in significantly higher recall with moderate precision. By combining the three metrics, we achieved F measures of 68% for semantically related terms and 37% for highly related entities.