Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units

Authors:
Joaquim Ferreira da Silva;Gaël Dias;Sylvie Guilloré;José Gabriel Pereira Lopes
Affiliations:
-;-;-;-
Venue:
EPIA '99 Proceedings of the 9th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Year:
1999

Citing 9
Cited 16

Word association norms, mutual information, and lexicography

Computational Linguistics
Retrieving terms and their variants in a lexicalized unification-based framework

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Termight: identifying and translating technical terminology

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Term extraction + term clustering: an integrated platform for computer-aided terminology

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Retrieving collocations by co-occurrences and word order constraints

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
From N-grams to collocations: an evaluation of Xtract

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics

Multilingual Document Clustering, Topic Extraction and Data Transformations

EPIA '01 Proceedings of the10th Portuguese Conference on Artificial Intelligence on Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving
Extracting Equivalents from Aligned Parallel Texts: Comparison of Measures of Similarity

IBERAMIA-SBIA '00 Proceedings of the International Joint Conference, 7th Ibero-American Conference on AI: Advances in Artificial Intelligence
Using Morphological, Syntactical, and Statistical Information for Automatic Term Acquisition

PorTAL '02 Proceedings of the Third International Conference on Advances in Natural Language Processing
A Distributed Approach for a Robust and Evolving NLP System

NLP '00 Proceedings of the Second International Conference on Natural Language Processing
A Self-Learning Method of Parallel Texts Alignment

AMTA '00 Proceedings of the 4th Conference of the Association for Machine Translation in the Americas on Envisioning Machine Translation in the Information Future
A Document Descriptor Extractor Based on Relevant Expressions

EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Improving gender classification of blog authors

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Mining large-scale comparable corpora from Chinese-English news collections

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Towards automatic building of document keywords

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
An n-gram frequency database reference to handle MWE extraction in NLP applications

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Web-Based Verification on the Representativeness of Terms Extracted from Single Short Documents

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Bilingual chunk alignment based on interactional matching and probabilistic latent semantic indexing

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
N-Gram feature selection for authorship identification

AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
Longest sorted sequence algorithm for parallel text alignment

EUROCAST'05 Proceedings of the 10th international conference on Computer Aided Systems Theory
Enriching temporal query understanding through date identification: how to tag implicit temporal queries?

Proceedings of the 2nd Temporal Web Analytics Workshop
A broad evaluation of techniques for automatic acquisition of multiword expressions

ACL '12 Proceedings of ACL 2012 Student Research Workshop

Quantified Score

Hi-index	0.01

Visualization

Abstract

The availability of contiguous and non-contiguous multiword lexical units (MWUs) in Natural Language Processing (NLP) lexica enhances parsing precision, helps attachment decisions, improves indexing in information retrieval (IR) systems, reinforces information extraction (IE) and text mining, among other applications. Unfortunately, their acquisition has long been a significant problem in NLP, IR and IE. In this paper we propose two new association measures, the Symmetric Conditional Probability (SCP) and the Mutual Expectation (ME) for the extraction of contiguous and noncontiguous MWUs. Both measures are used by a new algorithm, the LocalMaxs, that requires neither empirically obtained thresholds nor complex linguistic filters. We assess the results obtained by both measures by comparing them with reference association measures (Specific Mutual Information, Φ2, Dice and Log-Likelihood coefficients) over a multilingual parallel corpus. An additional experiment has been carried out over a part-of-speech tagged Portuguese corpus for extracting contiguous compound verbs.