Crowdsourced comprehension: predicting prerequisite structure in Wikipedia

  • Authors:
  • Partha Pratim Talukdar;William W. Cohen

  • Affiliations:
  • Carnegie Mellon University;Carnegie Mellon University

  • Venue:
  • Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The growth of open-access technical publications and other open-domain textual information sources means that there is an increasing amount of online technical material that is in principle available to all, but in practice, incomprehensible to most. We propose to address the task of helping readers comprehend complex technical material, by using statistical methods to model the "prerequisite structure" of a corpus --- i.e., the semantic impact of documents on an individual reader's state of knowledge. Experimental results using Wikipedia as the corpus suggest that this task can be approached by crowd-sourcing the production of ground-truth labels regarding prerequisite structure, and then generalizing these labels using a learned classifier which combines signals of various sorts. The features that we consider relate pairs of pages by analyzing not only textual features of the pages, but also how the containing corpora is connected and created.