Course-specific search engines: semi-automated methods for identifying high quality topic-specific corpora

  • Authors:
  • Neel Guha;Matt Wytock

  • Affiliations:
  • Henry M. Gunn High School, Palo Alto, CA, USA;Carnegie Mellon University, Pittsburgh, PA, USA

  • Venue:
  • Proceedings of the 22nd international conference on World Wide Web companion
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web search is an important research tool for many high school courses. However, generic search engines have a number of problems that arise out of not understanding the context of search (the high school course), leading to results that are off-topic or inappropriate as reference material. In this paper, we introduce the concept of a course-specific search engine and build such a search engine for the Advanced Placement US History (APUSH) course; the results of which are preferred by subject matter experts (high school teachers) over existing search engines. This reference search engine for APUSH relies on a hand-curated set of sites picked specifically for this educational context. In order to automate this expensive process, we describe two algorithms for indentifying high quality topical sites using an authoritative source such as a textbook: one based on textual similarity and another using structured data from knowledge bases. Initial experimental results indicate that these algorithms can successfully classify high quality documents leading to the automatic creation of topic-specific corpora for any course.