Information Extraction: Distilling Structured Data from Unstructured Text

  • Authors:
  • Andrew McCallum

  • Affiliations:
  • University of Massachusetts, Amherst

  • Venue:
  • Queue - Social Computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In 2001 the U.S. Department of Labor was tasked with building a Web site that would help people find continuing education opportunities at community colleges, universities, and organizations across the country. The department wanted its Web site to support fielded Boolean searches over locations, dates, times, prerequisites, instructors, topic areas, and course descriptions. Ultimately it was also interested in mining its new database for patterns and educational trends. This was a major data-integration project, aiming to automatically gather detailed, structured information from tens of thousands of individual institutions every three months.