Cognitive canonicalization of natural language queries using semantic strata

  • Authors:
  • Suman Deb Roy;Wenjun Zeng

  • Affiliations:
  • University of Missouri;University of Missouri

  • Venue:
  • ACM Transactions on Speech and Language Processing (TSLP)
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Natural language search relies strongly on perceiving semantics in a query sentence. Semantics is captured by the relationship among the query words, represented as a network (graph). Such a network of words can be fed into larger ontologies, like DBpedia or Google Knowledge Graph, where they appear as subgraphs— fashioning the name subnetworks (subnets). Thus, subnet is a canonical form for interfacing a natural language query to a graph database and is an integral step for graph-based searching. In this article, we present a novel standalone NLP technique that leverages the cognitive psychology notion of semantic strata for semantic subnetwork extraction from natural language queries. The cognitive model describes some of the fundamental structures employed by the human cognition to construct semantic information in the brain, called semantic strata. We propose a computational model based on conditional random fields to capture the cognitive abstraction provided by semantic strata, facilitating cognitive canonicalization of the query. Our results, conducted on approximately 5000 queries, suggest that the cognitive canonicals based on semantic strata are capable of significantly improving parsing and role labeling performance beyond pure lexical approaches, such as parts-of-speech based techniques. We also find that cognitive canonicalized subnets are more semantically coherent compared to syntax trees when explored in graph ontologies like DBpedia and improve ranking of retrieved documents.