Classification algorithms
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Time series similarity measures and time series indexing (abstract only)
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
HD-Eye: Visual Mining of High-Dimensional Data
IEEE Computer Graphics and Applications
Proceedings of the 17th International Conference on Data Engineering
Scaling up Dynamic Time Warping to Massive Dataset
PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
What Is the Nearest Neighbor in High Dimensional Spaces?
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
FALCON: Feedback Adaptive Loop for Content-Based Retrieval
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Efficient User-Adaptable Similarity Search in Large Multimedia Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Formulating distance functions via the kernel trick
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Formulating context-dependent similarity functions
Proceedings of the 13th annual ACM international conference on Multimedia
On Learning Asymmetric Dissimilarity Measures
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Advanced visualization of self-organizing maps with vector fields
Neural Networks - 2006 Special issue: Advances in self-organizing maps--WSOM'05
Artificial Intelligence and Law
Integration of well posedness analysis in software engineering
Proceedings of the 2007 ACM symposium on Applied computing
SCHISM: a new approach to interesting subspace mining
International Journal of Business Intelligence and Data Mining
ACM Transactions on Database Systems (TODS)
Estimating Sales Opportunity Using Similarity-Based Methods
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Incremental clustering of dynamic data streams using connectivity based representative points
Data & Knowledge Engineering
Making class bias useful: a strategy of learning from imbalanced data
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Distance metrics for high dimensional nearest neighborhood recovery: Compression and normalization
Information Sciences: an International Journal
Hi-index | 0.00 |
Distance function computation is a key subtask in many data mining algorithms and applications. The most effective form of the distance function can only be expressed in the context of a particular data domain. It is also often a challenging and non-trivial task to find the most effective form of the distance function. For example, in the text domain, distance function design has been considered such an important and complex issue that it has been the focus of intensive research over three decades. The final design of distance functions in this domain has been reached only by detailed empirical testing and consensus over the quality of results provided by the different variations. With the increasing ability to collect data in an automated way, the number of new kinds of data continues to increase rapidly. This makes it increasingly difficult to undertake such efforts for each and every new data type. The most important aspect of distance function design is that since a human is the end-user for any application, the design must satisfy the user requirements with regard to effectiveness. This creates the need for a systematic framework to design distance functions which are sensitive to the particular characteristics of the data domain. In this paper, we discuss such a framework. The goal is to create distance functions in an automated waywhile minimizing the work required from the user. We will show that this framework creates distance functions which are significantly more effective than popularly used functions such as the Euclidean metric.