Towards semantic category verification with arbitrary precision

  • Authors:
  • Dmitri Roussinov

  • Affiliations:
  • Department of Computer and Information Sciences, University of Strathclyde, Glasgow, UK

  • Venue:
  • ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many tasks related to or supporting information retrieval, such as query expansion, automated question answering, reasoning, or heterogeneous database integration, involve verification of a semantic category (e.g. "coffee" is a drink, "red" is a color, while "steak" is not a drink and "big" is not a color). We present a novel framework to automatically validate a membership in an arbitrary, not a trained a priori semantic category up to a desired level of accuracy. Our approach does not rely on any manually codified knowledge but instead capitalizes on the diversity of topics and word usage in a large corpus (e.g. World Wide Web). Using TREC factoid questions that expect the answer to belong to a specific semantic category, we show that a very high level of accuracy can be reached by automatically identifying more training seeds and more training patterns when needed. We develop a specific quantitative validation model that takes uncertainty and redundancy in the training data into consideration. We empirically confirm the important aspects of our model through ablation studies.