Compilation of a Spanish Representative Corpus

Authors:
Alexander F. Gelbukh;Grigori Sidorov;Liliana Chanona-Hernández
Affiliations:
-;-;-
Venue:
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2002

Citing 0
Cited 5

Spanish Temporal Expressions: Some Forms Reinforced by an Adverb

MICAI '08 Proceedings of the 7th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
Experiments on extracting semantic relations from syntactic relations

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
A corpus balancing method for language model construction

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Diachronic stemmed corpus and dictionary of Galician language

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
An experiment in detection and correction of malapropisms through the web

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Due to the Zipf law, even a very large corpus contains very few occurrences (tokens) for the majority of its different words (types). Only a corpus containing enough occurrences of even rare words can provide necessary statistical information for the study of contextual usage of words. We call such corpus representative and suggest to use Internet for its compilation. The corresponding algorithm and its application to Spanish are described. Different concepts of a representative corpus are discussed.