Indexing large metric spaces for similarity search queries
ACM Transactions on Database Systems (TODS)
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Support vector machine active learning for image retrieval
MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Optimizing Multi-Feature Queries for Image Databases
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Evaluating Top-k Queries over Web-Accessible Databases
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Optimizing Access Cost for Top-k Queries over Web Sources: A Unified Cost-Based Approach
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
RankFP: A Framework for Supporting Rank Formulation and Processing
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search
ACM Transactions on Database Systems (TODS)
SVM selective sampling for ranking with application to data retrieval
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
KDX: An Indexer for Support Vector Machines
IEEE Transactions on Knowledge and Data Engineering
Boolean + ranking: querying a database by k-constrained optimization
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Efficient top-k hyperplane query processing for multimedia information retrieval
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Enabling soft queries for data retrieval
Information Systems
Progressive and selective merge: computing top-k with ad-hoc ranking functions
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Enabling multi-level relevance feedback on pubmed by integrating rank learning into DBMS
Proceedings of the third international workshop on Data and text mining in bioinformatics
Selective sampling techniques for feedback-based data retrieval
Data Mining and Knowledge Discovery
An efficient method for learning nonlinear ranking SVM functions
Information Sciences: an International Journal
Indexing methods for efficient protein 3D surface search
Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics
Hi-index | 0.00 |
SVM (Support Vector Machine) is a well-established machine learning methodology popularly used for classification, regression, and ranking. Recently SVM has been actively researched for rank learning and applied to various applications including search engines or relevance feedback systems. A query in such systems is the ranking function F learned by SVM. Once learning a function F or formulating the query, processing the query to find top-k results requires evaluating the entire database by F. So far, there exists no exact indexing solution for SVM functions. Existing top-k query processing algorithms are not applicable to the machine-learned ranking functions, as they often make restrictive assumptions on the query, such as linearity or monotonicity of functions. Existing metric-based or reference-based indexing methods are also not applicable, because data points are invisible in the kernel space (SVM feature space) on which the index must be built. Existing kernel indexing methods return approximate results or fix kernel parameters. This paper proposes an exact indexing solution for SVM functions with varying kernel parameters. We first propose key geometric properties of the kernel space -- ranking instability and ordering stability -- which is crucial for building indices in the kernel space. Based on them, we develop an index structure iKernel and processing algorithms. We then present clustering techniques in the kernel space to enhance the pruning effectiveness of the index. According to our experiments, iKernel is highly effective overall producing 1~5% of evaluation ratio on large data sets. According to our best knowledge, iKernel is the first indexing solution that finds exact top-k results of SVM functions without a full scan of data set.