On the potential of domain literature for clustering and Bayesian network learning

Authors:
Peter Antal;Patrick Glenisson;Geert Fannes
Affiliations:
Katholieke Universiteit Leuven, El. Eng. ESAT-SCD (SISTA), Kasteelpark Arenberg 10, B-3001 Leuven, Belgium;Katholieke Universiteit Leuven, El. Eng. ESAT-SCD (SISTA), Kasteelpark Arenberg 10, B-3001 Leuven, Belgium;Katholieke Universiteit Leuven, El. Eng. ESAT-SCD (SISTA), Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 11
Cited 2

Algorithms for clustering data

Algorithms for clustering data
Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Stemming algorithms

Information retrieval
A Bayesian Method for the Induction of Probabilistic Networks from Data

Machine Learning
Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

Machine Learning
Information storage and retrieval

Information storage and retrieval
Gene functional classification from heterogeneous data

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Modern Information Retrieval

Modern Information Retrieval
Genes, Themes, and Microarrays: Using Information Retrieval for Large-Scale Gene Analysis

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
DOMAIN KNOWLEDGE BASED INFORMATION RETRIEVAL LANGUAGE: AN APPLICATION OF ANNOTATED BAYESIAN NETWORK IN OVARIAN CANCER DOMAIN

CBMS '02 Proceedings of the 15th IEEE Symposium on Computer-Based Medical Systems (CBMS'02)
Annotated Bayesian Networks: A Tool to Integrate Textual and Probabilistic Medical Knowledge

CBMS '01 Proceedings of the Fourteenth IEEE Symposium on Computer-Based Medical Systems

Bayesian applications of belief networks and multilayer perceptrons for ovarian tumor classification with rejection

Artificial Intelligence in Medicine
Using literature and data to learn Bayesian networks as clinical models of ovarian tumors

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Thanks to its increasing availability, electronic literature can now be a major source of information when developing complex statistical models where data is scarce or contains much noise. This raises the question of how to integrate information from domain literature with statistical data. Because quantifying similarities or dependencies between variables is a basic building block in knowledge discovery, we consider here the following question. Which vector representations of text and which statistical scores of similarity or dependency support best the use of literature in statistical models? For the text source, we assume to have annotations for the domain variables as short free-text descriptions and optionally to have a large literature repository from which we can further expand the annotations. For evaluation, we contrast the variables similarities or dependencies obtained from text using different annotation sources and vector representations with those obtained from measurement data or expert assessments. Specifically, we consider two learning problems: clustering and Bayesian network learning. Firstly, we report performance (against an expert reference) for clustering yeast genes from textual annotations. Secondly, we assess the agreement between text-based and data-based scores of variable dependencies when learning Bayesian network substructures for the task of modeling the joint distribution of clinical measurements of ovarian tumors.