Estimating content concreteness for finding comprehensible documents

  • Authors:
  • Shinya Tanaka;Adam Jatowt;Makoto P. Kato;Katsumi Tanaka

  • Affiliations:
  • Kyoto University, Kyoto, Japan;Kyoto University, Kyoto, Japan;Kyoto University, Kyoto, Japan;Kyoto University, Kyoto, Japan

  • Venue:
  • Proceedings of the sixth ACM international conference on Web search and data mining
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document comprehensibility is one of key factors determining document quality and, in result, user's satisfaction. Relevant web pages are of little utility if they are incomprehensible or impose too much cognitive burden on readers. Traditional measures of text difficulty focus often on syntactic factors of text such as sentence length, word length, syllable count, or they utilize fixed list of common terms. However, document comprehensibility depends on many factors, of which concreteness and the ease of concept visualization are crucial ones. In this paper, we first propose a method for predicting the concreteness of terms using SVM regression. We then extend it to calculating document concreteness level. The experimental results indicate satisfactory accuracy in estimating both term and document concreteness as well as demonstrate positive correlation between the document concreteness and comprehensibility. Our ultimate goal is to enable comprehension-driven search, which will return both relevant and comprehensible results.