Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Probabilistic recognition of human faces from video
Computer Vision and Image Understanding - Special issue on Face recognition
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Querying Imprecise Data in Moving Object Environments
IEEE Transactions on Knowledge and Data Engineering
A Bayesian Hierarchical Model for Learning Natural Scene Categories
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Working Models for Uncertain Data
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
ULDBs: databases with uncertainty and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Trio: a system for data, uncertainty, and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Model-driven data acquisition in sensor networks
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient query evaluation on probabilistic databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Probabilistic skylines on uncertain data
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Ranking queries on uncertain data: a probabilistic threshold approach
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Orion 2.0: native support for uncertain data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Sliding-window top-k queries on uncertain streams
Proceedings of the VLDB Endowment
Cleaning uncertain data with quality guarantees
Proceedings of the VLDB Endowment
Efficient Evaluation of Probabilistic Advanced Spatial Queries on Existentially Uncertain Data
IEEE Transactions on Knowledge and Data Engineering
Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations
IEEE Transactions on Knowledge and Data Engineering
Foundations and Trends in Databases
Continuous probabilistic nearest-neighbor queries for uncertain trajectories
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Computing all skyline probabilities for uncertain data
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Ranking distributed probabilistic data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
On the expressiveness of probabilistic XML models
The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases
Query evaluation over probabilistic XML
The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic nearest-neighbor query on uncertain objects
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Superseding Nearest Neighbor Search on Uncertain Spatial Databases
IEEE Transactions on Knowledge and Data Engineering
Entity Resolution and Information Quality
Entity Resolution and Information Quality
Hi-index | 0.00 |
The problem of entity resolution over probabilistic data (ERPD) arises in many applications that have to deal with probabilistic data. In many of these applications, probabilistic data is distributed among a number of nodes. The simple, centralized approach to the ERPD problem does not scale well as large amounts of data need to be sent to a central node. In this paper, we present FD (Fully Distributed), a decentralized algorithm for dealing with the ERPD problem over distributed data, with the goal of minimizing bandwidth usage and reducing processing time. FD is completely distributed and does not depend on the existence of certain nodes. We validated FD through implementation over a 75-node cluster and simulation using the PeerSim simulator. We used both synthetic and real-world data in our experiments. Our performance evaluation shows that FD can achieve major performance gains in terms of bandwidth usage and response time.