Neighborhood based detection of anomalies in high dimensional spatio-temporal sensor datasets
Proceedings of the 2004 ACM symposium on Applied computing
A k-mean clustering algorithm for mixed numeric and categorical data
Data & Knowledge Engineering
Bounds of Resemblance Measures for Binary (Presence/Absence) Variables
Journal of Classification
Searching for relevant software change artifacts using semantic networks
Proceedings of the 2009 ACM symposium on Applied Computing
Similarity coefficient methods applied to the cell formation problem: a comparative investigation
Computers and Industrial Engineering - Special issue: Group technology/cellular manufacturing
Using Semantic Networks and Context in Search for Relevant Software Engineering Artifacts
Journal on Data Semantics XIV
Spatial neighborhood based anomaly detection in sensor datasets
Data Mining and Knowledge Discovery
User-Centric Similarity and Proximity Measures for Spatial Personalization
International Journal of Data Warehousing and Mining
Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System
International Journal of Data Warehousing and Mining
Context and semantics for detection of cyber attacks
International Journal of Information and Computer Security
Hi-index | 0.00 |
In this paper, the authors present an empirical evaluation of similarity coefficients for binary valued data. Similarity coefficients provide a means to measure the similarity or distance between two binary valued objects in a dataset such that the attributes qualifying each object have a 0-1 value. This is useful in several domains, such as similarity of feature vectors in sensor networks, document search, router network mining, and web mining. The authors survey 35 similarity coefficients used in various domains and present conclusions about the efficacy of the similarity computed in 1 labeled data to quantify the accuracy of the similarity coefficients, 2 varying density of the data to evaluate the effect of sparsity of the values, and 3 varying number of attributes to see the effect of high dimensionality in the data on the similarity computed.