Voronoi diagrams—a survey of a fundamental geometric data structure
ACM Computing Surveys (CSUR)
Indexing large metric spaces for similarity search queries
ACM Transactions on Database Systems (TODS)
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Support vector machine active learning for image retrieval
MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Optimizing Multi-Feature Queries for Image Databases
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Evaluating Top-k Queries over Web-Accessible Databases
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Optimizing Access Cost for Top-k Queries over Web Sources: A Unified Cost-Based Approach
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
RankFP: A Framework for Supporting Rank Formulation and Processing
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search
ACM Transactions on Database Systems (TODS)
SVM selective sampling for ranking with application to data retrieval
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
KDX: An Indexer for Support Vector Machines
IEEE Transactions on Knowledge and Data Engineering
Boolean + ranking: querying a database by k-constrained optimization
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Efficient top-k hyperplane query processing for multimedia information retrieval
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Enabling soft queries for data retrieval
Information Systems
Progressive and selective merge: computing top-k with ad-hoc ranking functions
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Enabling multi-level relevance feedback on pubmed by integrating rank learning into DBMS
Proceedings of the third international workshop on Data and text mining in bioinformatics
Selective sampling techniques for feedback-based data retrieval
Data Mining and Knowledge Discovery
Hi-index | 0.07 |
SVM (Support Vector Machine) is a well-established machine learning methodology popularly used for learning classification, regression, and ranking functions. Especially, SVM for rank learning has been applied to various applications including search engines or relevance feedback systems. A ranking function F learned by SVM becomes the query in some search engines: A relevance function F is learned from the user's feedback which expresses the user's search intention, and top-k results are found by evaluating the entire database by F. This paper proposes an exact indexing solution for the SVM function queries, which is to find top-k results without evaluating the entire database. Indexing for SVM faces new challenges, that is, an index must be built on the kernel space (SVM feature space) where (1) data points are invisible and (2) the distance function changes with queries. Because of that, existing top-k query processing algorithms, or existing metric-based or reference-based indexing methods are not applicable. We first propose key geometric properties of the kernel space - ranking instability and ordering stability - which is crucial for building indices in the kernel space. Based on them, we develop an index structure iKernel and processing algorithms. We then present clustering techniques in the kernel space to enhance the pruning effectiveness of the index. According to our experiments, iKernel is highly effective overall producing 1-5% of evaluation ratio on large data sets.