Improving the efficiency of the Bayesian network retrieval model by reducing relationships between terms

Authors:
Luis M. de Campos;Juan M. Fernández-Luna;Juan F. Huete
Affiliations:
Departamento de Ciencias de la Computación e Inteligencia Artificial, E.T.S.I. Informática. Universidad de Granada, 18071, Granada. Spain;Departamento de Ciencias de la Computación e Inteligencia Artificial, E.T.S.I. Informática. Universidad de Granada, 18071, Granada. Spain;Departamento de Ciencias de la Computación e Inteligencia Artificial, E.T.S.I. Informática. Universidad de Granada, 18071, Granada. Spain
Venue:
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems - Intelligent information systems
Year:
2003

Citing 18
Cited 3

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Inference networks for document retrieval

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Inference networks for document retrieval

Inference networks for document retrieval
Experiments in automatic statistical thesaurus construction

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Bayesian inference networks and spreading activation in hypertext systems

Information Processing and Management: an International Journal
Computation of term associations by a neural network

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Concept based query expansion

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic thesaurus construction using Bayesian networks

Information Processing and Management: an International Journal - Special issue: history of information science
A belief network model for IR

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
“Is this document relevant?…probably”: a survey of probabilistic models in information retrieval

ACM Computing Surveys (CSUR)
Data clustering: a review

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A Layered Bayesian Network Model for Document Retrieval

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Building Bayesian Network-Based Information Retrieval Systems

DEXA '00 Proceedings of the 11th International Workshop on Database and Expert Systems Applications
Using machine learning to improve information access

Using machine learning to improve information access
Query expansion in information retrieval systems using a Bayesian network-based thesaurus

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Bayesian networks and information retrieval: an introduction to the special issue

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
A Bayesian Framework for XML Information Retrieval: Searching and Learning with the INEX Collection

Information Retrieval
Possibilistic networks for information retrieval

International Journal of Approximate Reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Bayesian Network Retrieval Model is able to represent the main (in)dependence relationships between the terms from a document collection by means of a specific type of Bayesian network, namely a polytree. However, although the learning and propagation algorithms designed for this topology are very efficient, in collections with a very large number of terms, these two tasks might be very time-consuming. This paper shows how by reducing the size of the polytree, which will only comprise one subset of terms which are selected according to their retrieval quality, the performance of the model is maintained, whereas the efforts needed to learn and later propagate in the model are considerably reduced. A method for selecting the best terms, based on their inverse document frequency and term discrimination value, is also presented.