Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Clustering mixed numerical and low quality categorical data: significance metrics on a yeast example
Proceedings of the 2nd international workshop on Information quality in information systems
Hi-index | 0.00 |
Clustering algorithms presented in the literature are rarely designed for the layered structure of the Internet topology and its unknown number of clusters. A recent approach in the research community is to collect IP path data and to map the IPs to known Autonomous Systems (ASs) forming the Internet backbones, resulting in large data sets of AS links. Clustering data sets of AS links permits inferring business relationships between ASs. We propose the MULIC soft algorithm for this purpose. MULICsoft clusters categorical data sets where the objects to be clustered are ASs and each categorical attribute value (CA) represents a link between ASs. Each CA has a 'weight' in the range of 0.0 to 1.0, that is inversely related to the number of unknown ASs in a link between ASs. MULIC soft produces as many clusters as it can find in the data set and each cluster consists of layers. The clustering results reflect the Internet topology at the AS-level, permitting inferring relationships between ASs.