A factuality profiler for eventualities in text

  • Authors:
  • James Pustejovsky;Roser Sauri

  • Affiliations:
  • Brandeis University;Brandeis University

  • Venue:
  • A factuality profiler for eventualities in text
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Event factuality is the level of information expressing the factual status of eventualities mentioned in text. That is, it conveys whether eventualities are characterized as corresponding to facts, to possibilities, or to situations that do not hold in the world. As such, it touches on two categories more standardly assumed in the literature: modality and evidentiality. They both have been widely discussed in linguistics and philosophy, but it is not until recently that have started to receive some attention within the area of NLP. Factuality is a necessary component for reasoning about eventualities in discourse. Inferences derived from events that have not happened, or that are possible, are different from those derived from events judged as factual. It is also essential for any task involving temporal ordering. The creation of event timelines needs to be aware of the different status of eventualities presented as uncertain or counterfactual. My dissertation aims at designing and developing a factuality profiler, namely a tool devoted to the identification of the factuality degree associated to eventualities mentioned in discourse. Event factuality cannot be conceived independently from language users, who are understood here as the sources of factuality information. Their inclusion in the model is fundamental. Two sources can assign different factuality values to the same event. Because of that, the factuality profiler must be capable of representing different and possibly contradictory information about the factuality nature of any event. De Facto, the tool I am presenting here, is grounded on the linguistic strategies we speakers employ to signal degrees of factuality in discourse. These involve information at different levels: lexical, syntactic, and rhetoric. De Facto implements an algorithm based on the grammatical structuring of factuality in languages like English, and is informed with a set of linguistic resources compiled from a data-driven approach. For evaluating De Facto, I created FactBank, a corpus annotated with factuality information. The interannotation agreement score for the task of assigning factuality values to events is kcohen = 0.81. Running De Facto against the gold standard results in F1=0.74 (macro-averaging), F1=0.85 (micro-averaging) and, in terms of interannotation agreement, kcohen =0.72.