Identifying novel information using latent semantic analysis in the WiQA task at CLEF 2006

  • Authors:
  • Richard F. E. Sutcliffe;Josef Steinberger;Udo Kruschwitz;Mijail Alexandrov-Kabadjov;Massimo Poesio

  • Affiliations:
  • Documents and Linguistic Technology Group, Department of Computer Science and Information Systems, University of Limerick, Limerick, Ireland;Department of Computer Science and Engineering, University of West Bohemia, Plzen, Czech Republic;Department of Computer Science, University of Essex, Colchester, UK;Department of Computer Science, University of Essex, Colchester, UK;Department of Computer Science, University of Essex, Colchester, UK

  • Venue:
  • CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In our two-stage system for the English monolingual WiQA Task, snippets were first retrieved if they contained an exact match with the title. Candidates were then passed to the Latent Semantic Analysis component which judged them Novel if their match with the article text was less than a threshold. In Run1, the ten best snippets were returned and in Run 2 the twenty best. Run 1 was superior, with Average Yield per Topic 2.46 and Precision 0.37. Compared to other groups, our performance was in the middle of the range except for Precision where our system was the best. We attribute this to our use of exact title matches in the IR stage. In future work we will vary the approach used depending on the topic type, exploit co-references in conjunction with exact matches and make use of the elaborate hyperlink structure which is a unique and most interesting aspect of the Wikipedia.