Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Training connectionist networks with queries and selective sampling
Advances in neural information processing systems 2
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Active learning using adaptive resampling
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatically extracting highlights for TV Baseball programs
MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Automatic detection of 'Goal' segments in basketball videos
MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Machine Learning
Machine Learning
Creating Ensembles of Classifiers
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Support Vector Machine Active Learning with Application sto Text Classification
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Active Hidden Markov Models for Information Extraction
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A new two-phase sampling based algorithm for discovering association rules
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The learning-curve sampling method applied to model-based clustering
The Journal of Machine Learning Research
Efficient data reduction with EASE
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A mid-level representation framework for semantic sports video analysis
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Audio keyword generation for sports video analysis
Proceedings of the 12th annual ACM international conference on Multimedia
Class Noise vs. Attribute Noise: A Quantitative Study
Artificial Intelligence Review
Active learning for class probability estimation and ranking
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
HMM-Based audio keyword generation
PCM'04 Proceedings of the 5th Pacific Rim conference on Advances in Multimedia Information Processing - Volume Part III
Support vector machines for histogram-based image classification
IEEE Transactions on Neural Networks
Query by shots: retrieving meaningful events using multiple queries and rough set theory
Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD 2008
RANSAC-based training data selection for emotion recognition from spontaneous speech
Proceedings of the 3rd international workshop on Affective interaction in natural environments
COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
International Journal of Multimedia Data Engineering & Management
Hi-index | 0.00 |
As the amount of multimedia data is increasing day-by-day thanks to less expensive storage devices and increasing numbers of information sources, machine learning algorithms are faced with large-sized and noisy datasets. Fortunately, the use of a good sampling set for training influences the final results significantly. But using a simple random sample (SRS) may not obtain satisfactory results because such a sample may not adequately represent the large and noisy dataset due to its blind approach in selecting samples. The difficulty is particularly apparent for huge datasets where, due to memory constraints, only very small sample sizes are used. This is typically the case for multimedia applications, where data size is usually very large. In this article we propose a new and efficient method to sample of large and noisy multimedia data. The proposed method is based on a simple distance measure that compares the histograms of the sample set and the whole set in order to estimate the representativeness of the sample. The proposed method deals with noise in an elegant manner which SRS and other methods are not able to deal with. We experiment on image and audio datasets. Comparison with SRS and other methods shows that the proposed method is vastly superior in terms of sample representativeness, particularly for small sample sizes although time-wise it is comparable to SRS, the least expensive method in terms of time.