Word association norms, mutual information, and lexicography
Computational Linguistics
Retrieving terms and their variants in a lexicalized unification-based framework
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Translating collocations for bilingual lexicons: a statistical approach
Computational Linguistics
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
Termight: identifying and translating technical terminology
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Term extraction + term clustering: an integrated platform for computer-aided terminology
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Retrieving collocations by co-occurrences and word order constraints
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
From N-grams to collocations: an evaluation of Xtract
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Multilingual Document Clustering, Topic Extraction and Data Transformations
EPIA '01 Proceedings of the10th Portuguese Conference on Artificial Intelligence on Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving
Extracting Equivalents from Aligned Parallel Texts: Comparison of Measures of Similarity
IBERAMIA-SBIA '00 Proceedings of the International Joint Conference, 7th Ibero-American Conference on AI: Advances in Artificial Intelligence
Using Morphological, Syntactical, and Statistical Information for Automatic Term Acquisition
PorTAL '02 Proceedings of the Third International Conference on Advances in Natural Language Processing
A Distributed Approach for a Robust and Evolving NLP System
NLP '00 Proceedings of the Second International Conference on Natural Language Processing
A Self-Learning Method of Parallel Texts Alignment
AMTA '00 Proceedings of the 4th Conference of the Association for Machine Translation in the Americas on Envisioning Machine Translation in the Information Future
A Document Descriptor Extractor Based on Relevant Expressions
EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Improving gender classification of blog authors
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Mining large-scale comparable corpora from Chinese-English news collections
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Towards automatic building of document keywords
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
An n-gram frequency database reference to handle MWE extraction in NLP applications
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Web-Based Verification on the Representativeness of Terms Extracted from Single Short Documents
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Bilingual chunk alignment based on interactional matching and probabilistic latent semantic indexing
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
N-Gram feature selection for authorship identification
AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
Longest sorted sequence algorithm for parallel text alignment
EUROCAST'05 Proceedings of the 10th international conference on Computer Aided Systems Theory
Proceedings of the 2nd Temporal Web Analytics Workshop
A broad evaluation of techniques for automatic acquisition of multiword expressions
ACL '12 Proceedings of ACL 2012 Student Research Workshop
Hi-index | 0.01 |
The availability of contiguous and non-contiguous multiword lexical units (MWUs) in Natural Language Processing (NLP) lexica enhances parsing precision, helps attachment decisions, improves indexing in information retrieval (IR) systems, reinforces information extraction (IE) and text mining, among other applications. Unfortunately, their acquisition has long been a significant problem in NLP, IR and IE. In this paper we propose two new association measures, the Symmetric Conditional Probability (SCP) and the Mutual Expectation (ME) for the extraction of contiguous and noncontiguous MWUs. Both measures are used by a new algorithm, the LocalMaxs, that requires neither empirically obtained thresholds nor complex linguistic filters. We assess the results obtained by both measures by comparing them with reference association measures (Specific Mutual Information, Φ2, Dice and Log-Likelihood coefficients) over a multilingual parallel corpus. An additional experiment has been carried out over a part-of-speech tagged Portuguese corpus for extracting contiguous compound verbs.