Teraphim: an engine for distributed information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Effective document presentation with a locality-based similarity heuristic
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Statistical phrases for vector-space information retrieval (poster abstract)
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Do batch and user evaluations give the same results?
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Personalization of search engine services for effective retrieval and knowledge management
ICIS '00 Proceedings of the twenty first international conference on Information systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Improved retrieval effectiveness through impact transformation
ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
The Importance of Prior Probabilities for Entry Page Search
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient phrase querying with an auxiliary index
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A Novel Web Text Mining Method Using the Discrete Cosine Transform
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Long-Term Learning for Web Search Engines
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
X2QL: An eXtensible XML Query Language Supporting User-Defined Foreign Functions
ADBIS-DASFAA '00 Proceedings of the East-European Conference on Advances in Databases and Information Systems Held Jointly with International Conference on Database Systems for Advanced Applications: Current Issues in Databases and Information Systems
Applying Machine Translation to Two-Stage Cross-Language Information
AMTA '00 Proceedings of the 4th Conference of the Association for Machine Translation in the Americas on Envisioning Machine Translation in the Information Future
Methods for identifying versioned and plagiarized documents
Journal of the American Society for Information Science and Technology
Searching large text collections
Handbook of massive data sets
Indexing for fast categorisation
ACSC '03 Proceedings of the 26th Australasian computer science conference - Volume 16
An empirical study on retrieval models for different document genres: patents and newspaper articles
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Supervised term weighting for automated text categorization
Proceedings of the 2003 ACM symposium on Applied computing
Constructing Web search queries from the user's information need expressed in a natural language
Proceedings of the 2003 ACM symposium on Applied computing
Index construction for linear categorisation
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Logical and uncertainty models for information access: current trends
The Knowledge Engineering Review
Fourier Domain Scoring: A Novel Document Ranking Method
IEEE Transactions on Knowledge and Data Engineering
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Element matching across data-oriented XML sources using a multi-strategy clustering model
Data & Knowledge Engineering
A hybrid approach for searching in the semantic web
Proceedings of the 13th international conference on World Wide Web
A formal study of information retrieval heuristics
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Tuning before feedback: combining ranking discovery and blind feedback for robust retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A generic ranking function discovery framework by genetic programming for information retrieval
Information Processing and Management: an International Journal
Fast phrase querying with combined indexes
ACM Transactions on Information Systems (TOIS)
A Novel Document Ranking Method Using the Discrete Cosine Transform
IEEE Transactions on Pattern Analysis and Machine Intelligence
A New Term Significance Weighting Approach
Journal of Intelligent Information Systems
Information Retrieval
Using information retrieval techniques for supporting data mining
Data & Knowledge Engineering
Simplified similarity scoring using term ranks
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An exploration of axiomatic approaches to information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A retrospective study of probabilistic context-based retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A novel document retrieval method using the discrete wavelet transform
ACM Transactions on Information Systems (TOIS)
Recommended reading for IR research students
ACM SIGIR Forum
Document fusion for comprehensive event description
HLTKM '01 Proceedings of the workshop on Human Language Technology and Knowledge Management - Volume 2001
A patent document retrieval system addressing both semantic and syntactic properties
PATENT '03 Proceedings of the ACL-2003 workshop on Patent corpus processing - Volume 20
Inverted files for text search engines
ACM Computing Surveys (CSUR)
SEFT: a search engine for text
Software—Practice & Experience
Evaluating patent retrieval in the third NTCIR workshop
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Adaptive Web Search: Evolving a Program That Finds Information
IEEE Intelligent Systems
An integrated two-stage model for intelligent information routing
Decision Support Systems
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient keyword search over virtual XML views
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Tracking Web spam with HTML style similarities
ACM Transactions on the Web (TWEB)
Evolved term-weighting schemes in Information Retrieval: an analysis of the solution space
Artificial Intelligence Review
Journal of Biomedical Informatics
Efficient phrase querying with common phrase index
Information Processing and Management: an International Journal
Evolving similarity functions for code plagiarism detection
Proceedings of the 10th annual conference on Genetic and evolutionary computation
Learning to rank at query-time using association rules
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Thematic Segment Retrieval Revisited
AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
Search advertising using web relevance feedback
Proceedings of the 17th ACM conference on Information and knowledge management
Do not crawl in the DUST: Different URLs with similar text
ACM Transactions on the Web (TWEB)
Term Impacts as Normalized Term Frequencies for BM25 Similarity Scoring
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Clusters, language models, and ad hoc information retrieval
ACM Transactions on Information Systems (TOIS)
Efficient keyword search over virtual XML views
The VLDB Journal — The International Journal on Very Large Data Bases
Document Compaction for Efficient Query Biased Snippet Generation
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Encoding Ordinal Features into Binary Features for Text Classification
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
A new dual wing harmonium model for document retrieval
Pattern Recognition
Expert Systems with Applications: An International Journal
Wikipedia-based semantic interpretation for natural language processing
Journal of Artificial Intelligence Research
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Multilayer SOM with tree-structured data for efficient document retrieval and plagiarism detection
IEEE Transactions on Neural Networks
Expert Systems with Applications: An International Journal
Evaluating patent retrieval in the third NTCIR workshop
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Learning to rank for content-based image retrieval
Proceedings of the international conference on Multimedia information retrieval
A novel dual wing harmonium model aided by 2-D wavelet transform subbands for document data mining
Expert Systems with Applications: An International Journal
Entropy-based authorship search in large document collections
ECIR'07 Proceedings of the 29th European conference on IR research
The dynamic web presentations with a generality model on the news domain
SOFSEM'08 Proceedings of the 34th conference on Current trends in theory and practice of computer science
Job information retrieval based on document similarity
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Applying maximum entropy to known-item email retrieval
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Master defect record retrieval using network-based feature association
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A study of information retrieval weighting schemes for sentiment analysis
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Examining the information retrieval process from an inductive perspective
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Topic detection by topic model induced distance using biased initiation
AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
Diagnostic Evaluation of Information Retrieval Models
ACM Transactions on Information Systems (TOIS)
Exploring the music similarity space on the web
ACM Transactions on Information Systems (TOIS)
A multi-level matching method with hybrid similarity for document retrieval
Expert Systems with Applications: An International Journal
Broadening vector space schemes for improving the quality of information retrieval
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
RMIT university at INEX 2005: ad hoc track
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Spectral-based document retrieval
ASIAN'04 Proceedings of the 9th Asian Computing Science conference on Advances in Computer Science: dedicated to Jean-Louis Lassez on the Occasion of His 5th Cycle Birthday
New algorithms on wavelet trees and applications to information retrieval
Theoretical Computer Science
Efficient phrase querying with common phrase index
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Efficient and secure ranked multi-keyword search on encrypted cloud data
Proceedings of the 2012 Joint EDBT/ICDT Workshops
Modeling higher-order term dependencies in information retrieval using query hypergraphs
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Non-binary evaluation for schema matching
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Approximate document outlier detection using random spectral projection
AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Using micro-documents for feature selection: The case of ordinal text classification
Expert Systems with Applications: An International Journal
Privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking
Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security
VILO: a rapid learning nearest-neighbor classifier for malware triage
Journal in Computer Virology
A survey of music similarity and recommendation from music context data
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Schema matching prediction with applications to data source discovery and dynamic ensembling
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient multi-keyword ranked query over encrypted data in cloud computing
Future Generation Computer Systems
An efficient privacy-preserving multi-keyword search over encrypted cloud data with ranking
Distributed and Parallel Databases
Hi-index | 0.01 |
Ranked queries are used to locate relevant documents in text databases. In a ranked query a list of terms is specified, then the documents that most closely match the query are returned---in decreasing order of similarity---as answers. Crucial to the efficacy of ranked querying is the use of a similarity heuristic, a mechanism that assigns a numeric score indicating how closely a document and the query match. In this note we explore and categorise a range of similarity heuristics described in the literature. We have implemented all of these measures in a structured way, and have carried out retrieval experiments with a substantial subset of these measures.Our purpose with this work is threefold: first, in enumerating the various measures in an orthogonal framework we make it straightforward for other researchers to describe and discuss similarity measures; second, by experimenting with a wide range of the measures, we hope to observe which features yield good retrieval behaviour in a variety of retrieval environments; and third, by describing our results so far, to gather feedback on the issues we have uncovered. We demonstrate that it is surprisingly difficult to identify which techniques work best, and comment on the experimental methodology required to support any claims as to the superiority of one method over another.