Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
ASWC '09 Proceedings of the 4th Asian Conference on The Semantic Web
Ontology instance matching by considering semantic link cloud
ACE'10 Proceedings of the 9th WSEAS international conference on Applications of computer engineering
Streaming cross document entity coreference resolution
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Automatically generating data linkages using a domain-independent candidate selection approach
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Hi-index | 0.00 |
A pair of RDF instances are said to corefer when they are intended to denote the same thing in the world, for example, when two nodes of type foaf:Person describe the same individual. This problem is central to integrating and inter-linking semi-structured datasets. We are developing an online, unsupervised coreference resolution framework for heterogeneous, semi-structured data. The online aspect requires us to process new instances as they appear and not as a batch. The instances are heterogeneous in that they may contain terms from different ontologies whose alignments are not known in advance. Our framework encompasses a two-phased clustering algorithm that is both flexible and distributable, a probabilistic multidimensional attribute model that will support robust schema mappings, and a consolidation algorithm that will be used to perform instance consolidation in order to improve accuracy rates over time by addressing data spareness.