Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Hardening soft information sources
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning domain-independent string transformation weights for high accuracy object identification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to cluster web search results
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Comparative study of name disambiguation problem using a scalable blocking-based framework
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
Object identification is one of the major challenges in integrating data from multiple information sources, since being short of global identifiers, it is hard to find all records referring to the same object in an integrated database. Traditional object identification techniques tend to use character-based or vector space model-based similarity computing in judging, but they cannot work well in merchandise databases. This paper brings forward a new approach to object identification. First, we use merchandise images to judge whether two records belong to the same object; then, we use Naïve Bayesian Model to judge whether two merchandise names have similar meaning. We do experiments on data downloaded from shopping websites, and the results show good performance.