Word association norms, mutual information, and lexicography
Computational Linguistics
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
Hi-index | 0.00 |
Chinese V-N collocations have two possible structural relations: verb-object relation and attributive-head relation. Both of them are widely used in Chinese language processing tasks, but long distance and low frequency collocations are often difficult to extract. A weighted mutual information (WMI) model and a rule-based method were designed to acquire V-N collocations by taking more syntactic structure features into consideration. The WMI model extracted verb-object collocation within clauses. It reduced the interference of illegal collocates and highlighted the weight of long distance collocates, by giving different weights to collocates in different locations. The rule-based method used part of speech patterns to extract verb-object and attributive-head collocations, and inferred implicit collocations. The experiments show that, the WMI model optimizes evaluation scores of long distance collocations, while the rule-based method is more accurate in extracting and distinguishing the two kinds of collocations, including low frequency collocations.