On modeling of information retrieval concepts in vector spaces
ACM Transactions on Database Systems (TODS)
Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
Filtered document retrieval with frequency-sorted indexes
Journal of the American Society for Information Science
Generalized vector spaces model in information retrieval
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
On the necessity of term dependence in a query space for weighted retrieval
Journal of the American Society for Information Science
Experiments on the determination of the relationships between terms
ACM Transactions on Database Systems (TODS)
Computer Evaluation of Indexing and Text Processing
Journal of the ACM (JACM)
Precision Weighting—An Effective Automatic Indexing Method
Journal of the ACM (JACM)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Enhancing Concept-Based Retrieval Based onMinimal Term Sets
Journal of Intelligent Information Systems - Special issue on methodologies for intelligent information systems
Generating non-redundant association rules
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Modern Information Retrieval
An evaluation of term dependence models in information retrieval
SIGIR '82 Proceedings of the 5th annual ACM conference on Research and development in information retrieval
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Enhancing the Set-Based Model Using Proximity Information
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Information Retrieval
Set-based vector model: An efficient approach for correlation-based ranking
ACM Transactions on Information Systems (TOIS)
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Personalized search based on user intention through the hierarchical phrase vector model
ACC'08 Proceedings of the WSEAS International Conference on Applied Computing Conference
Relating dependent indexes using dempster-shafer theory
Proceedings of the 17th ACM conference on Information and knowledge management
User intention based personalized search: HPS(hierarchical phrase search)
WSEAS Transactions on Circuits and Systems
HQE: A hybrid method for query expansion
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
The objective of this paper is to present a new technique for computing term weights for index terms, which leads to a new ranking mechanism, referred to as set-based model. The components in our model are no longer terms, but termsets. The novelty is that we compute term weights using a data mining technique called association rules, which is time efficient and yet yields nice improvements in retrieval effectiveness. The set-based model function for computing the similarity between a document and a query considers the termset frequency in the document and its scarcity in the document collection. Experimental results show that our model improves the average precision of the answer set for all three collections evaluated. For the TReC-3 collection, our set-based model led to a gain, relative to the standard vector space model, of 37% in average precision curves and of 57% in average precision for the top 10 documents. Like the vector space model, the set-based model has time complexity that is linear in the number of documents in the collection.