Exact indexing for support vector machines

Authors:
Hwanjo Yu;Ilhwan Ko;Youngdae Kim;Seungwon Hwang;Wook-Shin Han
Affiliations:
POSTECH (Pohang University of Science and Technology), Pohang, South Korea;POSTECH (Pohang University of Science and Technology), Pohang, South Korea;POSTECH (Pohang University of Science and Technology), Pohang, South Korea;POSTECH (Pohang University of Science and Technology), Pohang, South Korea;Kyungpook National University, Daegu, South Korea
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 20
Cited 2

Indexing large metric spaces for similarity search queries

ACM Transactions on Database Systems (TODS)
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Support vector machine active learning for image retrieval

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Evaluating Top-k Queries over Web-Accessible Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Optimizing Access Cost for Top-k Queries over Web Sources: A Unified Cost-Based Approach

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
RankFP: A Framework for Supporting Rank Formulation and Processing

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
SVM selective sampling for ranking with application to data retrieval

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
KDX: An Indexer for Support Vector Machines

IEEE Transactions on Knowledge and Data Engineering
Boolean + ranking: querying a database by k-constrained optimization

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Efficient top-k hyperplane query processing for multimedia information retrieval

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Enabling soft queries for data retrieval

Information Systems
Progressive and selective merge: computing top-k with ad-hoc ranking functions

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Enabling multi-level relevance feedback on pubmed by integrating rank learning into DBMS

Proceedings of the third international workshop on Data and text mining in bioinformatics
Selective sampling techniques for feedback-based data retrieval

Data Mining and Knowledge Discovery

An efficient method for learning nonlinear ranking SVM functions

Information Sciences: an International Journal
Indexing methods for efficient protein 3D surface search

Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

SVM (Support Vector Machine) is a well-established machine learning methodology popularly used for classification, regression, and ranking. Recently SVM has been actively researched for rank learning and applied to various applications including search engines or relevance feedback systems. A query in such systems is the ranking function F learned by SVM. Once learning a function F or formulating the query, processing the query to find top-k results requires evaluating the entire database by F. So far, there exists no exact indexing solution for SVM functions. Existing top-k query processing algorithms are not applicable to the machine-learned ranking functions, as they often make restrictive assumptions on the query, such as linearity or monotonicity of functions. Existing metric-based or reference-based indexing methods are also not applicable, because data points are invisible in the kernel space (SVM feature space) on which the index must be built. Existing kernel indexing methods return approximate results or fix kernel parameters. This paper proposes an exact indexing solution for SVM functions with varying kernel parameters. We first propose key geometric properties of the kernel space -- ranking instability and ordering stability -- which is crucial for building indices in the kernel space. Based on them, we develop an index structure iKernel and processing algorithms. We then present clustering techniques in the kernel space to enhance the pruning effectiveness of the index. According to our experiments, iKernel is highly effective overall producing 1~5% of evaluation ratio on large data sets. According to our best knowledge, iKernel is the first indexing solution that finds exact top-k results of SVM functions without a full scan of data set.