Foundations of statistical natural language processing
Foundations of statistical natural language processing
The Association Factor in Information Retrieval
Journal of the ACM (JACM)
Introduction to the special issue on the web as corpus
Computational Linguistics - Special issue on web as corpus
Using the web to obtain frequencies for unseen bigrams
Computational Linguistics - Special issue on web as corpus
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
The construction of an empirically based mathematically derived classification system
AIEE-IRE '62 (Spring) Proceedings of the May 1-3, 1962, spring joint computer conference
Can we correctly estimate the total number of pages in Google for a specific language?
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Web-assisted detection and correction of joint and disjoint malapropos word combinations
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Detection and correction of malapropisms in spanish by means of internet search
TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
Distribution-based semantic similarity of nouns
CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Various criteria of collocation cohesion in internet: comparison of resolving power
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Web-Based measurements of intra-collocational cohesion in oxford collocations dictionary
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Two methods of evaluation of semantic similarity of nouns based on their modifier sets
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Hi-index | 0.00 |
Syntactic links between content words in meaningful texts are intuitively conceived ‘normal,' thus ensuring text cohesion. Nevertheless we are not aware on a broadly accepted Internet-based measure of cohesion between words syntactically linked in terms of Dependency Grammars. We propose to measure lexico-syntactic cohesion between content words by means of Internet with a specially introduced Stable Connection Index (SCI). SCI is similar to Mutual Information known in statistics, but does not require iterative evaluation of total amount of Web-pages under search engine's control and is insensitive to both fluctuations and slow growth of raw Web statistics. Based on Russian, Spanish, and English materials, SCI presented concentrated distributions for various types of word combinations; hence lexico-syntactic cohesion acquires a simple numeric measure. It is shown that SCI evaluations can be successfully used for semantic error detection and correction, as well as for information retrieval.