Data Mining by Means of Binary Representation: A Model for Similarity and Clustering

Authors:
Zippy Erlich;Roy Gelbard;Israel Spiegler
Affiliations:
Computer Science Department, The Open University, Tel Aviv 61392, Israel;Technology and Information Systems Department, The Leon Recanati Graduate School of Business Administration, Tel Aviv University, Tel Aviv 69978, Israel;Technology and Information Systems Department, The Leon Recanati Graduate School of Business Administration, Tel Aviv University, Tel Aviv 69978, Israel
Venue:
Information Systems Frontiers
Year:
2002

Citing 16
Cited 8

Storage and retrieval considerations of binary data bases

Information Processing and Management: an International Journal
Algorithms for clustering data

Algorithms for clustering data
Mining scientific data

Communications of the ACM
Data mining

Data mining
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
From data mining to knowledge discovery: current challenges and future directions

Advances in knowledge discovery and data mining
Discovering data mining: from concept to implementation

Discovering data mining: from concept to implementation
Data mining (Invited talk. Abstract only): crossing the Chasm

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data clustering: a review

ACM Computing Surveys (CSUR)
Hempel's raven paradox: a positive approach to cluster analysis

Computers and Operations Research
Clustering Algorithms

Clustering Algorithms
Data Mining-Guest Editors' Introduction: From Serendipity to Science

Computer
Mining Very Large Databases

Computer
Special section: data mining

Journal of Management Information Systems - Special section: Data mining
Choosing data-mining methods for multiple classification: representational and performance measurement implications for decision support

Journal of Management Information Systems - Special section: Data mining

Short Term and Total Life Impact analysis of email worms in computer systems

Decision Support Systems
Visualization of multi-algorithm clustering for better economic decisions - The case of car pricing

Decision Support Systems
Classification by clustering decision tree-like classifier based on adjusted clusters

Expert Systems with Applications: An International Journal
Classification by clustering decision tree-like classifier based on adjusted clusters

Expert Systems with Applications: An International Journal
Experimental analysis of the q-matrix method in knowledge discovery

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
A decision support method, based on bounded rationality concepts, to reveal feature saliency in clustering problems

Decision Support Systems
User activities outlier detection system using principal component analysis and fuzzy rule-based system

Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments
"Padding" bitmaps to support similarity and mining

Information Systems Frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we outline a new method for clustering that is based on a binary representation of data records. The binary database relates each entity to all possible attribute values (domain) that entity may assume. The resulting binary matrix allows for similarity and clustering calculation by using the positive (‘1’ bits) of the entity vector. We formulate two indexes: Pair Similarity Index (PSI) to measure similarity between two entities and Group Similarity Index (GSI) to measure similarity within a group of entities. A threshold factor for each attribute domain is defined that is dependent on the domain but independent of the number of entities in the group. The similarity measure provides simplicity of storage and efficiency of calculation. A comparison of our similarity index to other indexes is made. Experiments with sample data indicate a 48% improvement of group similarity over standard methods pointing to the potential and merit of the binary approach to clustering and data mining.