Frequency, collocation, and statistical modeling of lexical items: a case study of temporal expressions in an elderly speaker corpus

Authors:
Sheng-Fu Wang;Jing-Chen Yang;Yu-Yun Chang;Yu-Wen Liu;Shu-Kai Hsieh
Affiliations:
National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan Normal University;National Taiwan University
Venue:
ROCLING '11 Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing
Year:
2011

Citing 3
Cited 0

Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
A graph model for unsupervised lexical acquisition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Automatic word clustering in Russian texts

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study examines how different dimensions of corpus frequency data may affect the outcome of statistical modeling of lexical items. The corpus used in our analysis is an elderly speaker corpus in its early development, and the target words are temporal expressions, which might reveal how the speech produced by the elderly is organized. We conduct divisive hierarchical clustering based on two different dimensions of corpus data, namely raw frequency distribution and collocation-based vectors. Results show when different dimensions of data were used as the input, the target terms were indeed clustered in different ways. Analyses based on frequency distributions and collocational patterns are distinct from each other. Specifically, statistically-based collocational analysis produces more distinct clustering results that differentiate temporal terms more delicately than do the ones based on raw frequency.