iKernel: Exact indexing for support vector machines

Authors:
Youngdae Kim;Ilhwan Ko;Wook-Shin Han;Hwanjo Yu
Affiliations:
-;-;-;-
Venue:
Information Sciences: an International Journal
Year:
2014

Citing 21
Cited 0

Voronoi diagrams—a survey of a fundamental geometric data structure

ACM Computing Surveys (CSUR)
Indexing large metric spaces for similarity search queries

ACM Transactions on Database Systems (TODS)
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Support vector machine active learning for image retrieval

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Evaluating Top-k Queries over Web-Accessible Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Optimizing Access Cost for Top-k Queries over Web Sources: A Unified Cost-Based Approach

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
RankFP: A Framework for Supporting Rank Formulation and Processing

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
SVM selective sampling for ranking with application to data retrieval

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
KDX: An Indexer for Support Vector Machines

IEEE Transactions on Knowledge and Data Engineering
Boolean + ranking: querying a database by k-constrained optimization

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Efficient top-k hyperplane query processing for multimedia information retrieval

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Enabling soft queries for data retrieval

Information Systems
Progressive and selective merge: computing top-k with ad-hoc ranking functions

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Enabling multi-level relevance feedback on pubmed by integrating rank learning into DBMS

Proceedings of the third international workshop on Data and text mining in bioinformatics
Selective sampling techniques for feedback-based data retrieval

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.07

Visualization

Abstract

SVM (Support Vector Machine) is a well-established machine learning methodology popularly used for learning classification, regression, and ranking functions. Especially, SVM for rank learning has been applied to various applications including search engines or relevance feedback systems. A ranking function F learned by SVM becomes the query in some search engines: A relevance function F is learned from the user's feedback which expresses the user's search intention, and top-k results are found by evaluating the entire database by F. This paper proposes an exact indexing solution for the SVM function queries, which is to find top-k results without evaluating the entire database. Indexing for SVM faces new challenges, that is, an index must be built on the kernel space (SVM feature space) where (1) data points are invisible and (2) the distance function changes with queries. Because of that, existing top-k query processing algorithms, or existing metric-based or reference-based indexing methods are not applicable. We first propose key geometric properties of the kernel space - ranking instability and ordering stability - which is crucial for building indices in the kernel space. Based on them, we develop an index structure iKernel and processing algorithms. We then present clustering techniques in the kernel space to enhance the pruning effectiveness of the index. According to our experiments, iKernel is highly effective overall producing 1-5% of evaluation ratio on large data sets.