Text content reliability estimation in web documents: a new proposal

  • Authors:
  • Luis Sanz;Héctor Allende;Marcelo Mendoza

  • Affiliations:
  • Department of Informatics, Universidad Técnica Federico Santa María, Chile;Department of Informatics, Universidad Técnica Federico Santa María, Chile;Department of Informatics, Universidad Técnica Federico Santa María, Chile

  • Venue:
  • CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper illustrates how a combination of information retrieval, machine learning, and NLP corpus annotation techniques was applied to a problem of text content reliability estimation in Web documents. Our proposal for text content reliability estimation is based on a model in which reliability is a similarity measure between the content of the documents and a knowledge corpus. The proposal includes a new representation of text which uses entailment-based graphs. Then we use the graph-based representations as training instances for a machine learning algorithm allowing to build a reliability model. Experimental results illustrate the feasibility of our proposal by performing a comparison with a state-of-the-art method.