Estimation of a Priori Decision Threshold for Collocations Extraction: An Empirical Study

Authors:
Fethi Fkih;Mohamed Nazih Omri
Affiliations:
MARS Research Unit, Faculty of sciences of Monastir, University of Monastir, Monastir, Tunisia;MARS Research Unit, Faculty of sciences of Monastir, University of Monastir, Monastir, Tunisia
Venue:
International Journal of Information Technology and Web Engineering
Year:
2013

Citing 17
Cited 0

A critical investigation of recall and precision as measures of retrieval system performance

ACM Transactions on Information Systems (TOIS)
Word association norms, mutual information, and lexicography

Computational Linguistics
Foundations of statistical natural language processing

Foundations of statistical natural language processing
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Parsing, word associations and typical predicate-argument relations

HLT '89 Proceedings of the workshop on Speech and Natural Language
Collocation extraction based on modifiability statistics

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Combining association measures for collocation extraction

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
ROC Curves for Continuous Data

ROC Curves for Continuous Data
Measuring classifier performance: a coherent alternative to the area under the ROC curve

Machine Learning
Extending lexical association measures for collocation extraction

Computer Speech and Language
Using small random samples for the manual evaluation of statistical association measures

Computer Speech and Language
Evaluating classifiers: relation between area under the receiver operator characteristic curve and overall accuracy

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Beyond accuracy, f-score and ROC: a family of discriminant measures for performance evaluation

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Bayesian Threshold Estimation

IEEE Transactions on Education

Quantified Score

Hi-index	0.00

Visualization

Abstract

Choosing the optimal threshold for the collocations extraction remains a manual task performed by experts. Until today, there is no serious work, based on deep studies, which explores possible solutions to automate the learning of the threshold in the statistical terminology field. In this paper, the authors try to spotlight on this problem by exploring, firstly, the evaluation performance techniques used in several scientific areas such as biomedical and biometric and applying them, subsequently, on the statistical terminology field. The experimental study gives promoters results. First, it shows the effectiveness of usual techniques such as ROC and Precision-Recall curves used to evaluate the performance of binary classification systems. Second, it provides a practical solution for automatic estimation of optimal thresholds for collocation extraction systems.