Fast algorithm for assessing semantic similarity of texts

  • Authors:
  • Andrzej Siemiński

  • Affiliations:
  • Institute for Informatics, Technical University of Wrocław, Wybrzeże Wyspiańskiego 27, 53-370 Wrocław, Poland

  • Venue:
  • International Journal of Intelligent Information and Database Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper presents and evaluates an efficient algorithm for measuring semantic similarity of texts. Calculating the level of semantic similarity of texts is a very difficult task and the proposed up to now methods suffer from computational complexity. This substantially limits their application area. The proposed algorithm tries to reduce the problem by merging a computationally efficient statistical approach to text analysis with a semantic component. The semantic properties of text words are extracted from the WordNet lexical database. The approach was tested using WordNets for two languages: English and Polish. The basic properties of this approach are also studied. The paper concludes with an analysis of the performance of the proposed method on a sample database and suggests some possible application areas.