Set-based model: a new approach for information retrieval

Authors:
Bruno Pôssas;Nivio Ziviani;Wagner Meira, Jr.;Berthier Ribeiro-Neto
Affiliations:
Universidade Federal de Minas Gerais, Belo Horizonte-MG, Brazil;Universidade Federal de Minas Gerais, Belo Horizonte-MG, Brazil;Universidade Federal de Minas Gerais, Belo Horizonte-MG, Brazil;Universidade Federal de Minas Gerais, Belo Horizonte-MG, Brazil
Venue:
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2002

Citing 17
Cited 8

On modeling of information retrieval concepts in vector spaces

ACM Transactions on Database Systems (TODS)
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Overview of the second text retrieval conference (TREC-2)

TREC-2 Proceedings of the second conference on Text retrieval conference
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Generalized vector spaces model in information retrieval

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
On the necessity of term dependence in a query space for weighted retrieval

Journal of the American Society for Information Science
Experiments on the determination of the relationships between terms

ACM Transactions on Database Systems (TODS)
Computer Evaluation of Indexing and Text Processing

Journal of the ACM (JACM)
Precision Weighting—An Effective Automatic Indexing Method

Journal of the ACM (JACM)
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Enhancing Concept-Based Retrieval Based onMinimal Term Sets

Journal of Intelligent Information Systems - Special issue on methodologies for intelligent information systems
Generating non-redundant association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Modern Information Retrieval

Modern Information Retrieval
An evaluation of term dependence models in information retrieval

SIGIR '82 Proceedings of the 5th annual ACM conference on Research and development in information retrieval
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing

Enhancing the Set-Based Model Using Proximity Information

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Learning to Rank

Information Retrieval
Set-based vector model: An efficient approach for correlation-based ranking

ACM Transactions on Information Systems (TOIS)
Topical query decomposition

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Personalized search based on user intention through the hierarchical phrase vector model

ACC'08 Proceedings of the WSEAS International Conference on Applied Computing Conference
Relating dependent indexes using dempster-shafer theory

Proceedings of the 17th ACM conference on Information and knowledge management
User intention based personalized search: HPS(hierarchical phrase search)

WSEAS Transactions on Circuits and Systems
HQE: A hybrid method for query expansion

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The objective of this paper is to present a new technique for computing term weights for index terms, which leads to a new ranking mechanism, referred to as set-based model. The components in our model are no longer terms, but termsets. The novelty is that we compute term weights using a data mining technique called association rules, which is time efficient and yet yields nice improvements in retrieval effectiveness. The set-based model function for computing the similarity between a document and a query considers the termset frequency in the document and its scarcity in the document collection. Experimental results show that our model improves the average precision of the answer set for all three collections evaluated. For the TReC-3 collection, our set-based model led to a gain, relative to the standard vector space model, of 37% in average precision curves and of 57% in average precision for the top 10 documents. Like the vector space model, the set-based model has time complexity that is linear in the number of documents in the collection.