Use of ranked cross document evidence trails for hypothesis generation

  • Authors:
  • Rohini K. Srihari;Li Xu;Tushar Saxena

  • Affiliations:
  • State University of New York at Buffalo;State University of New York at Buffalo;State University of New York at Buffalo

  • Venue:
  • Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper focuses on detecting how concepts are linked across multiple textdocuments by generating an evidence trail explaining the connection. A traditional search involving, for example, two or more person names willattempt to find documents mentioning both of these individuals. This researchfocuses on a different interpretation of such a query: what is the best evidencetrail across documents that explains a connection between these individuals? For example, allmay be good golfers. A generalization ofthis task involves query terms representing general concepts (e.g. indictment,foreign policy). Such queries reflect a special case oftext mining. Previous attempts to solve this problem have focused on graphapproaches involving hyperlinked documents, and link analysis tools exploiting named entities. A new robust framework is presented, based on (i) generating concept chain graphs, a hybrid content representation, (ii) performing graph matching to select candidate subgraphs, and (iii) subsequently using graphical models to validate hypotheses using ranked evidence trails. We adapt the DUC data set for cross-document summarization to evaluate evidence trails generated by this approach.