Adjusting Fuzzy Similarity Functions for use with standard data mining tools

Authors:
Avichai Meged;Roy Gelbard
Affiliations:
Information System Program, Graduate School of Business Administration, Bar-Ilan University, Ramat-Gan 52900, Israel;Information System Program, Graduate School of Business Administration, Bar-Ilan University, Ramat-Gan 52900, Israel
Venue:
Journal of Systems and Software
Year:
2011

Citing 21
Cited 2

Storage and retrieval considerations of binary data bases

Information Processing and Management: an International Journal
Algorithms for clustering data

Algorithms for clustering data
Selecting typical instances in instance-based learning

ML92 Proceedings of the ninth international workshop on Machine learning
Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data clustering: a review

ACM Computing Surveys (CSUR)
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

Machine Learning
Techniques of Cluster Algorithms in Data Mining

Data Mining and Knowledge Discovery
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Learning to match and cluster large high-dimensional data sets for data integration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast k-Nearest Neighbor Classification Using Cluster-Based Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast and Robust General Purpose Clustering Algorithms

Data Mining and Knowledge Discovery
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Hybrid approaches to product recommendation based on customer lifetime value and purchase preferences

Journal of Systems and Software
Program restructuring using clustering techniques

Journal of Systems and Software - Special issue: Selected papers from the 4th source code analysis and manipulation (SCAM 2004) workshop
Determining an optimal membership function based on community consensus in a fuzzy database system

Proceedings of the 44th annual Southeast regional conference
Efficient Clustering of Uncertain Data

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Investigating diversity of clustering methods: An empirical comparison

Data & Knowledge Engineering
Improving fuzzy clustering of biological data by metric learning with side information

International Journal of Approximate Reasoning
Clustering with alternative similarity functions

AIKED'08 Proceedings of the 7th WSEAS International Conference on Artificial intelligence, knowledge engineering and data bases
Survey of clustering algorithms

IEEE Transactions on Neural Networks

A decision support method, based on bounded rationality concepts, to reveal feature saliency in clustering problems

Decision Support Systems
Constrained frequent pattern mining on univariate uncertain data

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data mining is crucial in many areas and there are ongoing efforts to improve its effectiveness in both the scientific and the business world. There is an obvious need to improve the outcomes of mining techniques such as clustering and other classifiers without abandoning the standard mining tools that are popular with researchers and practitioners alike. Currently, however, standard tools do not have the flexibility to control similarity relations between attribute values, a critical feature in improving mining-clustering results. The study presented here introduces the Similarity Adjustment Model (SAM) where adjusted Fuzzy Similarity Functions (FSF) control similarity relations between attribute values and hence ameliorate clustering results obtained with standard data mining tools such as SPSS and SAS. The SAM draws on principles of binary database representation models and employs FSF adjusted via an iterative learning process that yields improved segmentation regardless of the choice of mining-clustering algorithm. The SAM model is illustrated and evaluated on three common datasets with the standard SPSS package. The datasets were run with several clustering algorithms. Comparison of ''Naive'' runs (which used original data) and ''Fuzzy'' runs (which used SAM) shows that the SAM improves segmentation in all cases.