Word association norms, mutual information, and lexicography
Computational Linguistics
Suffix arrays: a new method for on-line string searches
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
MARSYAS: a framework for audio analysis
Organised Sound
Termight: identifying and translating technical terminology
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Retrieving collocations by co-occurrences and word order constraints
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Clustering Syntactic Positions with Similar Semantic Requirements
Computational Linguistics
Identification of Document Language is Not yet a Completely Solved Problem
CIMCA '06 Proceedings of the International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce
Accurate collocation extraction using a multilingual parser
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Mining large-scale comparable corpora from Chinese-English news collections
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Hi-index | 0.00 |
For Information Retrieval purposes, there is a need for regularly processing predictably dynamic and potentially huge corpora for extraction of contiguous Multi Word Expressions (MWEs), in a way that should be computationally tractable. In this paper we'll be mainly exploring the use of Suffix Arrays, together with the SCP association measure and the Smoothed LocalMaxs algorithm. The choice of Suffix Arrays and the construction of auxiliary structures enabled a clear minimization of the time for extracting multi-word expressions, with linear complexity by the introduction of a limitation on the number of words. Despite the methodology being essentially of a statistical nature, we show how to handle hybrid extraction mechanisms.