Electricity based external similarity of categorical attributes

Authors:
Christopher R. Palmer;Christos Faloutsos
Affiliations:
Vivisimo, Inc., Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2003

Citing 5
Cited 11

The electrical resistance of a graph captures its commute and cover times

STOC '89 Proceedings of the twenty-first annual ACM symposium on Theory of computing
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering categorical data: an approach based on dynamical systems

The VLDB Journal — The International Journal on Very Large Data Bases
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

Automatic multimedia cross-modal correlation discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation

IEEE Transactions on Knowledge and Data Engineering
Hierarchical clustering of mixed data based on distance hierarchy

Information Sciences: an International Journal
Relational link-based ranking

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Schema mapping verification: the spicy way

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph nodes clustering with the sigmoid commute-time kernel: A comparative study

Data & Knowledge Engineering
Mutual Information Based Extrinsic Similarity for Microarray Analysis

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
TANGENT: a novel, 'Surprise me', recommendation algorithm

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
DISC: data-intensive similarity measure for categorical data

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Attribute value weighting in k-modes clustering

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity or distance measures are fundamental and critical properties for data mining tools. Categorical attributes abound in databases. The Car Make, Gender, Occupation, etc. fields in a automobile insurance database are very informative. Sadly, categorical data is not easily amenable to similarity computations. A domain expert might manually specify some or all of the similarity relationships, but this is error-prone and not feasible for attributes with large domains, nor is it useful for cross-attribute similarities, such as between Gender and Occupation. External similarity functions define a similarity between, say, Car Makes by looking at how they co-occur with the other categorical attributes. We exploit a rich duality between random walks on graphs and electrical circuits to develop REP, an external similarity function. REP is theoretically grounded while the only prior work was ad-hoc. The usefulness of REP is shown in two experiments. First, we cluster categorical attribute values showing improved inferred relationships. Second, we use REP effectively as a nearest neighbour classifier.