Machine Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Data Mining
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
Faceted product search powered by the Semantic Web
Decision Support Systems
Hi-index | 0.00 |
The detection of product duplicates is one of the many challenges that Web shop product aggregators are facing. This paper presents two new methods to solve the problem of product duplicate detection. Both methods extend a state-of-the-art approach that uses the found model words in product titles to detect product duplicates. The first proposed method uses several distance measures to calculate distances between product attribute keys and values to find duplicate products when no matching product title is found. The second proposed method detects matching model words in all product attribute values in order to find duplicate products when no matching product title is found. Based on our experimental results on real-world data gathered from two existing Web shops, we show that the second proposed method significantly outperforms the existing state-of-the-art method in terms of F1-measure, while the first method outperforms the existing state-of-the-art method in terms of F1-measure, but not significantly.