Measurements of lexico-syntactic cohesion by means of internet

  • Authors:
  • Igor A. Bolshakov;Elena I. Bolshakova

  • Affiliations:
  • Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico City, Mexico;Faculty of Computational Mathematics and Cybernetics, Moscow, Moscow State Lomonosov University, Russia

  • Venue:
  • MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Syntactic links between content words in meaningful texts are intuitively conceived ‘normal,' thus ensuring text cohesion. Nevertheless we are not aware on a broadly accepted Internet-based measure of cohesion between words syntactically linked in terms of Dependency Grammars. We propose to measure lexico-syntactic cohesion between content words by means of Internet with a specially introduced Stable Connection Index (SCI). SCI is similar to Mutual Information known in statistics, but does not require iterative evaluation of total amount of Web-pages under search engine's control and is insensitive to both fluctuations and slow growth of raw Web statistics. Based on Russian, Spanish, and English materials, SCI presented concentrated distributions for various types of word combinations; hence lexico-syntactic cohesion acquires a simple numeric measure. It is shown that SCI evaluations can be successfully used for semantic error detection and correction, as well as for information retrieval.