Soft evaluation of Boolean search queries in information retrieval systems
Information Technology Research Development Applications
On modeling of information retrieval concepts in vector spaces
ACM Transactions on Database Systems (TODS)
Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Inference networks for document retrieval
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient retrieval of partial documents
TREC-2 Proceedings of the second conference on Text retrieval conference
Generalized vector spaces model in information retrieval
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
On the necessity of term dependence in a query space for weighted retrieval
Journal of the American Society for Information Science
Information retrieval as statistical translation
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Results and challenges in Web search evaluation
WWW '99 Proceedings of the eighth international conference on World Wide Web
A general language model for information retrieval
Proceedings of the eighth international conference on Information and knowledge management
Experiments on the determination of the relationships between terms
ACM Transactions on Database Systems (TODS)
On Relevance, Probabilistic Indexing and Information Retrieval
Journal of the ACM (JACM)
Computer Evaluation of Indexing and Text Processing
Journal of the ACM (JACM)
Precision Weighting—An Effective Automatic Indexing Method
Journal of the ACM (JACM)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Efficient passage ranking for document databases
ACM Transactions on Information Systems (TOIS)
Enhancing Concept-Based Retrieval Based onMinimal Term Sets
Journal of Intelligent Information Systems - Special issue on methodologies for intelligent information systems
Generating non-redundant association rules
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Document language models, query models, and risk minimization for information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A context vector model for information retrieval
Journal of the American Society for Information Science and Technology
Information Retrieval
Modern Information Retrieval
Set-based model: a new approach for information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Biterm language models for document retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Capturing term dependencies using a language model based on sentence trees
Proceedings of the eleventh international conference on Information and knowledge management
An evaluation of term dependence models in information retrieval
SIGIR '82 Proceedings of the 5th annual ACM conference on Research and development in information retrieval
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Enhancing the Set-Based Model Using Proximity Information
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
CoBWeb A Crawler for the Brazilian Web
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Dependence language model for information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Maximal termsets as a query structuring mechanism
Proceedings of the 14th ACM international conference on Information and knowledge management
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Query-sets: using implicit feedback and query patterns to organize web documents
Proceedings of the 17th international conference on World Wide Web
Exploiting Morphological Query Structure Using Genetic Optimisation
NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Structure of morphologically expanded queries: A genetic algorithm approach
Data & Knowledge Engineering
A vector space approach to tag cloud similarity ranking
Information Processing Letters
Query clauses and term independence
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
A peer-to-peer architecture for information retrieval across digital library collections
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Using genetic algorithms for query reformulation
FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
Hi-index | 0.00 |
This work presents a new approach for ranking documents in the vector space model. The novelty lies in two fronts. First, patterns of term co-occurrence are taken into account and are processed efficiently. Second, term weights are generated using a data mining technique called association rules. This leads to a new ranking mechanism called the set-based vector model. The components of our model are no longer index terms but index termsets, where a termset is a set of index terms. Termsets capture the intuition that semantically related terms appear close to each other in a document. They can be efficiently obtained by limiting the computation to small passages of text. Once termsets have been computed, the ranking is calculated as a function of the termset frequency in the document and its scarcity in the document collection. Experimental results show that the set-based vector model improves average precision for all collections and query types evaluated, while keeping computational costs small. For the 2-gigabyte TREC-8 collection, the set-based vector model leads to a gain in average precision figures of 14.7% and 16.4% for disjunctive and conjunctive queries, respectively, with respect to the standard vector space model. These gains increase to 24.9% and 30.0%, respectively, when proximity information is taken into account. Query processing times are larger but, on average, still comparable to those obtained with the standard vector model (increases in processing time varied from 30% to 300%). Our results suggest that the set-based vector model provides a correlation-based ranking formula that is effective with general collections and computationally practical.