Scalable Representative Instance Selection and Ranking

Authors:
Xingquan Zhu;Xindong Wu
Affiliations:
University of Vermont, VT;University of Vermont, VT
Venue:
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 03
Year:
2006

Citing 0
Cited 9

A search space reduction methodology for data mining in large databases

Engineering Applications of Artificial Intelligence
A divide-and-conquer recursive approach for scaling up instance selection algorithms

Data Mining and Knowledge Discovery
Prototype selection algorithms for distributed learning

Pattern Recognition
A search space reduction methodology for large databases: a case study

ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Distributed learning with data reduction

Transactions on computational collective intelligence IV
A new cluster-based instance selection algorithm

KES-AMSTA'11 Proceedings of the 5th KES international conference on Agent and multi-agent systems: technologies and applications
Experimental evaluation of the agent-based population learning algorithm for the cluster-based instance selection

ICCCI'11 Proceedings of the Third international conference on Computational collective intelligence: technologies and applications - Volume Part II
A hybrid intelligent approach for modeling brand choice and constructing a market response simulator

Knowledge-Based Systems
An automated search space reduction methodology for large databases

ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding a small set of representative instances for large datasets can bring various benefits to data mining practitioners so they can (1) build a learner superior to the one constructed from the whole massive data; and (2) avoid working on the whole original dataset all the time. We propose in this paper a Scalable Representative Instance Selection And Ranking (SRISTAR pronounced 3STAR) mechanism, which carries two unique features: (1) it provides a representative instance ranking list, so that users can always select instances from the top to the bottom, based on the number of examples they prefer; and (2) it investigates the behaviors of the underlying examples for instance selection, and the selection procedure tries to optimize the expected future error. Given a dataset, we first cluster instances into small data cells, each of which consists of instances with similar behaviors. Then we progressively evaluate data cells and their combinations, and order them into a list such that the learners built from the top cells are more accurate.