Staging a realistic entity resolution challenge for students

  • Authors:
  • Yinle Zhou;John Talburt

  • Affiliations:
  • University of Arkansas at Little Rock, Little Rock, AR;University of Arkansas at Little Rock, Little Rock, AR

  • Venue:
  • Journal of Computing Sciences in Colleges
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the experience of constructing and deploying a significant exercise in entity resolution as a way to more closely simulate the challenges often encountered in real-world data integration projects. Based on a consistent set of synthetically generated demographic data that have been separated and disrupted in a controlled manner, the datasets used in the exercise are large enough (several thousand records) to provide students with a significant challenge yet small enough to be managed within a semester course using tools that will run on a desktop platform. Because the starting state of the integrated data is known, student progress in re-integrating the data can be readily and objectively measured to give students feedback on their progress and also allowing them to assess the effectiveness of different strategies and approaches they might try. The details given here are based on the experience of conducting the ER challenge on three occasions in two different courses.