Public record aggregation using semi-supervised entity resolution

  • Authors:
  • Jack G. Conrad;Christopher Dozier;Hugo Molina-Salgado;Merine Thomas;Sriharsha Veeramachaneni

  • Affiliations:
  • Thomson Reuters Research and Development, Saint Paul, Minnesota;Thomson Reuters Research and Development, Saint Paul, Minnesota;Thomson Reuters Research and Development, Saint Paul, Minnesota;Thomson Reuters Research and Development, Saint Paul, Minnesota;Thomson Reuters Research and Development, Saint Paul, Minnesota

  • Venue:
  • Proceedings of the 13th International Conference on Artificial Intelligence and Law
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a highly scalable state of the art record aggregation system and the backbone infrastructure developed to support it. The system, called PeopleMap, allows legal professionals to effectively and efficiently explore a broad spectrum of public records databases by way of a single person-centric search. The backbone support system, called Concord, is a toolkit that allows developers to economically create record resolution solutions. The PeopleMap system is capable of linking billions of public records to a master data set consisting of hundreds of millions of person records. It was constructed using successive applications of Concord to link disparate public record data sets to a central person authority file. To our knowledge, the PeopleMap system is the largest of its kind. In contrast, the Concord support system is a novel record linkage tool that uses a new semi-supervised training technique called `surrogate learning' to enable the rapid development of record resolution solutions.