Word association norms, mutual information, and lexicography
Computational Linguistics
Introduction to the special issue on computational linguistics using large corpora
Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
A Mathematical Theory of Communication
A Mathematical Theory of Communication
A nonparametric method for extraction of candidate phrasal terms
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Word extraction based on semantic constraints in chinese word-formation
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.00 |
A desired property of a measure of connective strength in bigrams is that the measure should be insensitive to corpus size. This paper investigates the stability of three different measures over text genres and expansion of the corpus. The measures are (1) the commonly used mutual information, (2) the difference in mutual information, and (3) raw occurrence. Mutual information is further compared to using knowledge about genres to remove overlap between genres. This last approach considers the difference between two products of the same process (human text-generation) constrained by different genres. The cancellation of overlap seems to provide the most specific word pairs for each genre.