Definition and Formalization of Entity Resolution Functions for Everyday Information Integration

  • Authors:
  • David W. Archer;Lois M. Delcambre

  • Affiliations:
  • Department of Computer Science, Portland State University, Portland OR 97207;Department of Computer Science, Portland State University, Portland OR 97207

  • Venue:
  • Semantics in Data and Knowledge Bases
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data integration on a human-manageable scale, by users without database expertise, is a more common activity than integration of large databases. Users often gather fine-grained data and organize it in an entity-centric way, developing tables of information regarding real-world objects, ideas, or people. Often, they do this by copying and pasting bits of data from e-mails, databases, or text files into a spreadsheet. During this process, users evolve their notions of entities and attributes. They combine sets of entities or attributes, split them again, update attribute values, and retract those updates. These functions are neither well supported by current tools, nor formally well understood. Our research seeks to capture and make explicit the data integration decisions made during these activities. In this paper, we formally define entity resolution and de-resolution, and show that these functions behave predictably and intuitively in the presence of attribute value updates.