Fast algorithm for assessing semantic similarity of texts

Authors:
Andrzej Siemiński
Affiliations:
Institute for Informatics, Technical University of Wrocław, Wybrzeże Wyspiańskiego 27, 53-370 Wrocław, Poland
Venue:
International Journal of Intelligent Information and Database Systems
Year:
2012

Citing 7
Cited 1

A unified approach to indexing and retrieval of information

SIGDOC '94 Proceedings of the 12th annual international conference on Systems documentation: technical communications at the great divide
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
A vector space model for automatic indexing

Communications of the ACM
Recognition of Common Areas in a Web Page Using Visual Information: a possible application in a page classification

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Robust classification of rare queries using web knowledge

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Using WordNet to Measure the Similarity of Link Texts

ICCCI '09 Proceedings of the 1st International Conference on Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems

Comment on Wang et al.'s anonymous multi-receiver ID-based encryption scheme and its improved schemes

International Journal of Intelligent Information and Database Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper presents and evaluates an efficient algorithm for measuring semantic similarity of texts. Calculating the level of semantic similarity of texts is a very difficult task and the proposed up to now methods suffer from computational complexity. This substantially limits their application area. The proposed algorithm tries to reduce the problem by merging a computationally efficient statistical approach to text analysis with a semantic component. The semantic properties of text words are extracted from the WordNet lexical database. The approach was tested using WordNets for two languages: English and Polish. The basic properties of this approach are also studied. The paper concludes with an analysis of the performance of the proposed method on a sample database and suggests some possible application areas.