Extracting significant words from corpora for ontology extraction

Authors:
Dileep Damle;Victoria Uren
Affiliations:
The Open University, Milton Keynes, UK;The Open University, Milton Keynes, UK
Venue:
Proceedings of the 3rd international conference on Knowledge capture
Year:
2005

Citing 10
Cited 3

Using latent semantic analysis to improve access to textual information

CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Identifying word correspondence in parallel texts

HLT '91 Proceedings of the workshop on Speech and Natural Language
Pathfinder associative networks: studies in knowledge organization

Pathfinder associative networks: studies in knowledge organization
Ontology Learning for the Semantic Web

IEEE Intelligent Systems
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
A program for aligning sentences in bilingual corpora

Computational Linguistics - Special issue on using large corpora: I
Distribution of content words and phrases in text and language modelling

Natural Language Engineering
Empirical estimates of adaptation: the chance of two noriegas is closer to p/2 than p2

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Parsing, word associations and typical predicate-argument relations

HLT '89 Proceedings of the workshop on Speech and Natural Language
Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites

Computational Linguistics

Determining termhood for learning domain ontologies using domain prevalence and tendency

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Determining termhood for learning domain ontologies in a probabilistic framework

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Design and realization of advertisement promotion based on the content of webpage

KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show a new method for term extraction from a domain relevant corpus using natural language processing for the purposes of semi-automatic ontology learning. Literature shows that topical words occur in bursts. We find that the ranking of extracted terms is insensitive to the choice of population model, but calculating frequencies relative to the burst size rather than the document length in words yields significantly different results.