Extracting nested collocations

Authors:
Katerina T. Frantzi;Sophia Ananiadou
Affiliations:
Manchester Metropolitan University, Manchester, U.K.;Manchester Metropolitan University, Manchester, U.K.
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Year:
1996

Citing 6
Cited 25

Word association norms, mutual information, and lexicography

Computational Linguistics
Relational models and metascience

Relational models of the lexicon
Theory of Information and Coding

Theory of Information and Coding
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
A new method of N-gram statistics for large number of n and automatic extraction of words and phrases from large text data of Japanese

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1

Working Towards Connectionist Modelling of Term Formation

Proceedings of the 6th International Conference on Computational Intelligence, Theory and Applications: Fuzzy Days
The C-value/NC-value Method of Automatic Recognition for Multi-Word Terms

ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
A multilingual usage consultation tool based on internet searching: more than a search engine, less than QA

WWW '05 Proceedings of the 14th international conference on World Wide Web
A simple but powerful automatic term extraction method

COMPUTERM '02 COLING-02 on COMPUTERM 2002: second international workshop on computational terminology - Volume 14
Multilingual phrase-based concordance generation in real-time

Information Retrieval
The Building of a CBD-Based Domain Ontology in Chinese

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Research on Automatic Chinese Multi-word Term Extraction Based on Term Component

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Chinese Terminology Extraction Using Window-Based Contextual Information

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Disambiguating Personal Names on the Web using Automatically Extracted Key Phrases

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Detecting Temporal Trends of Technical Phrases by Using Importance Indices and Linear Regression

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Research on Automatic Chinese Multi-word Term Extraction Based on Integration of Web Information and Term Component

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Reducing SMT rule table with monolingual key phrase

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
A delimiter-based general approach for Chinese term extraction

Journal of the American Society for Information Science and Technology
Discovering Volatile Events in Your Neighborhood: Local-Area Topic Extraction from Blog Entries

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Improving statistical machine translation using domain bilingual multiword expressions

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Detecting temporal patterns of technical phrases by using importance indices in a research documents

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Pattern-based semantic tagging for ontology population

SOCASE'08 Proceedings of the 2008 AAMAS international conference on Service-oriented computing: agents, semantics, and engineering
Reordering constraint based on document-level context

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Annotating knowledge work lifelog: term extraction from sensor and operation history

Proceedings of the 20th ACM international conference on Information and knowledge management
From phoneme to morpheme: another verification using a corpus

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Extracting key phrases to disambiguate personal names on the web

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Entropy as an indicator of context boundaries: an experiment using a web search engine

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
The role of multi-word units in interactive information retrieval

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Topic detection and multi-word terms extraction for arabic unvowelized documents

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Automatic construction and enrichment of informal ontologies: A survey

Programming and Computing Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper provides an approach to the semi-automatic extraction of collocations from corpora using statistics. The growing availability of large textual corpora, and the increasing number of applications of collocation extraction, has given rise to various approaches on the topic. In this paper, we address the problem of nested collocations; that is, those being part of longer collocations. Most approaches till now, treated substrings of collocations as collocations, only if they appeared frequently enough by themselves in the corpus. These techniques left a lot of collocations unextracted. In this paper, we propose an algorithm for a semi-automatic extraction of nested uninterrupted and interrupted collocations, paying particular attention to nested collocation.