A probabilistic framework for automatic term recognition

Authors:
Wilson Wong;Wei Liu;Mohammed Bennamoun
Affiliations:
School of Computer Science and Software Engineering, University of Western Australia, Crawley, WA, Australia. E-mail: {wilson,wei,bennamou}@csse.uwa.edu.au;School of Computer Science and Software Engineering, University of Western Australia, Crawley, WA, Australia. E-mail: {wilson,wei,bennamou}@csse.uwa.edu.au;School of Computer Science and Software Engineering, University of Western Australia, Crawley, WA, Australia. E-mail: {wilson,wei,bennamou}@csse.uwa.edu.au
Venue:
Intelligent Data Analysis
Year:
2009

Citing 20
Cited 3

On generalizing the Two-Poisson model

Journal of the American Society for Information Science
N-Poisson document modelling

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models in information retrieval

The Computer Journal - Special issue on information retrieval
Two models of retrieval with probabilistic indexing

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Clumping properties of content-bearing words

Journal of the American Society for Information Science
Foundations of statistical natural language processing

Foundations of statistical natural language processing
A probabilistic model of information retrieval: development and comparative experiments Part 2

Information Processing and Management: an International Journal
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Term Frequency Normalization via Pareto Distributions

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Evaluating message understanding systems: an analysis of the third message understanding conference (MUC-3)

Computational Linguistics
Distribution of content words and phrases in text and language modelling

Natural Language Engineering
A method of measuring term representativeness: baseline method using co-occurrence distribution

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Identifying terms by their family and friends

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
The head-modifier principle and multilingual term extraction

Natural Language Engineering
Improved automatic keyword extraction given more linguistic knowledge

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Tree-Traversing Ant Algorithm for term clustering based on featureless similarities

Data Mining and Knowledge Discovery
Determining termhood for learning domain ontologies using domain prevalence and tendency

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70

Resources for Turkish morphological processing

Language Resources and Evaluation
Autonomous and adaptive identification of topics in unstructured text

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part II
Ontology learning from text: A look back and into the future

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Term recognition identifies domain-relevant terms which are essential for discovering domain concepts and for the construction of terminologies required by a wide range of natural language applications. Many techniques have been developed in an attempt to numerically determine or quantify termhood based on term characteristics. Some of the apparent shortcomings of existing techniques are the ad-hoc combination of termhood evidence, mathematically-unfounded derivation of scores and implicit assumptions concerning term characteristics. We propose a probabilistic framework for formalising and combining qualitative evidence based on explicitly defined term characteristics to produce a new termhood measure. Our qualitative and quantitative evaluations demonstrate consistently better precision, recall and accuracy compared to three other existing ad-hoc measures.