Foundations of statistical natural language processing
Foundations of statistical natural language processing
Introduction to the special issue on the web as corpus
Computational Linguistics - Special issue on web as corpus
Using the web to obtain frequencies for unseen bigrams
Computational Linguistics - Special issue on web as corpus
Collocation extraction based on modifiability statistics
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Combining Methods for Detecting and Correcting Semantic Hidden Errors in Arabic Texts
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Dependency Language Modeling Using KNN and PLSI
MICAI '09 Proceedings of the 8th Mexican International Conference on Artificial Intelligence
Learning Co-relations of Plausible Verb Arguments with a WSM and a Distributional Thesaurus
CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Web-based model for disambiguation of prepositional phrase usage
MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Measurements of lexico-syntactic cohesion by means of internet
MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
Malapropisms are real-word errors that lead to syntactically correct but semantically implausible text. We report an experiment on detection and correction of Spanish malapropisms. Malapropos words semantically destroy collocations (syntactically connected word pairs) they are in. Thus we detect possible malapropisms as words that do not form semantically plausible collocations with neighboring words. As correction candidates, we select words similar to the suspected one but forming plausible collocations with neighboring words. To judge semantic plausibility of a collocation, we use Google statistics of occurrences of the word combination and of the two words taken apart. Since collocation components can be separated by other words in a sentence, Google statistics is gathered for the most probable distance between them. The statistics is recalculated to a specially defined Semantic Compatibility Index (SCI). Heuristic rules are proposed to signal malapropisms when SCI values are lower than a predetermined threshold and to retain a few highly SCI-ranked correction candidates. Our experiments gave promising results.