Comparing the Contribution of Syntactic and Semantic Features in Closed versus Open Domain Question Answering

  • Authors:
  • Abolfazl Keighobadi Lamjiri;Leila Kosseim;Thiruvengadam Radhakrishnan

  • Affiliations:
  • Concordia University, Canada;Concordia University, Canada;Concordia University, Canada

  • Venue:
  • ICSC '07 Proceedings of the International Conference on Semantic Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we analyze the contribution of semantic, syntactic and word similarity of document features in closed and open domain question answering. Semantic similarity is computed as the similarity of the action in the candidate sentence to the action asked in the question, measured using WordNet::Similarity on main verbs. The syntactic similarity feature measures the unifiability of a candidate's parse tree with the question's parse tree. It uses syntactic restrictions as well as lexical measures to compute the unifiability of critical syntactic participants in the parse trees. Finally, the word similarity of the document containing a candidate sentence is computed as the cosine of the angle between the question keywords vector and the document vector. Since the semantic feature is more reliable on content verbs and syntactic similarity is suitable for questions with a subjectverb- object syntactic structure, we only consider questions with a main content verb in our analysis (non-copulative questions). This type comprise 70% of our closed domain and 33% of our open domain test questions. The combination of these three features achieves an MRR of 28% in our closed domain and 23% in open domain. Our analysis shows that the syntactic feature has a significant contribution in both open and closed domains. However, the path-based lch semantic similarity measure we used, only contributes in our closed domain probably because of less variation in the vocabulary and topic. Document IR score on the other hand, has more contribution in open domain, because query keywords are more discriminating in a large document set with a vast vocabulary range.