Learning a taxonomy from a set of text documents

Authors:
Mari-Sanna Paukkeri;Alberto Pérez García-Plaza;Víctor Fresno;Raquel Martínez Unanue;Timo Honkela
Affiliations:
Aalto University School of Science, Adaptive Informatics Research Centre, P.O. Box 15400, FI-00076 Aalto, Finland;NLP & IR Group, E.T.S.I. Informáática, UNED, 28040 Madrid, Spain;NLP & IR Group, E.T.S.I. Informáática, UNED, 28040 Madrid, Spain;NLP & IR Group, E.T.S.I. Informáática, UNED, 28040 Madrid, Spain;Aalto University School of Science, Adaptive Informatics Research Centre, P.O. Box 15400, FI-00076 Aalto, Finland
Venue:
Applied Soft Computing
Year:
2012

Citing 27
Cited 1

Deriving concept hierarchies from text

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing

Communications of the ACM
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
Self-Organizing Maps

Self-Organizing Maps
A fuzzy system for the web page representation

Intelligent exploration of the web
Overview and analysis of methodologies for building ontologies

The Knowledge Engineering Review
A taxonomy for English nouns and verbs

ACL '81 Proceedings of the 19th annual meeting on Association for Computational Linguistics
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Automatic construction of a hypernym-labeled noun hierarchy from text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Unsupervised models for morpheme segmentation and morphology learning

ACM Transactions on Speech and Language Processing (TSLP)
A Taxonomy Learning Method and Its Application to Characterize a Scientific Web Community

IEEE Transactions on Knowledge and Data Engineering
Semantic taxonomy induction from heterogenous evidence

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Personalized mining of web documents using link structures and fuzzy concept networks

Applied Soft Computing
Neural Network Based Document Clustering Using WordNet Ontologies

International Journal of Hybrid Intelligent Systems
ConSOM: A conceptional self-organizing map model for text clustering

Neurocomputing
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-Organizing Maps

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Knowledge-Based Linguistic Annotation of Digital Cultural Heritage Collections

IEEE Intelligent Systems
Natural Language Processing as a Foundation of the Semantic Web

Foundations and Trends in Web Science
Deriving a large scale taxonomy from Wikipedia

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Learning concept hierarchies from text corpora using formal concept analysis

Journal of Artificial Intelligence Research
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Unsupervised learning of semantic relations between concepts of a molecular biology ontology

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
The noisy channel model for unsupervised word sense disambiguation

Computational Linguistics
Inductive probabilistic taxonomy learning using singular value decomposition

Natural Language Engineering
On how to perform a gold standard based evaluation of ontology learning

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
ONTOCOM: a cost estimation model for ontology engineering

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Global stability of generalized additive fuzzy systems

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

Assessing user-specific difficulty of documents

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a methodology for learning a taxonomy from a set of text documents that each describes one concept. The taxonomy is obtained by clustering the concept definition documents with a hierarchical approach to the Self-Organizing Map. In this study, we compare three different feature extraction approaches with varying degree of language independence. The feature extraction schemes include fuzzy logic-based feature weighting and selection, statistical keyphrase extraction, and the traditional tf-idf weighting scheme. The experiments are conducted for English, Finnish, and Spanish. The results show that while the rule-based fuzzy logic systems have an advantage in automatic taxonomy learning, taxonomies can also be constructed with tolerable results using statistical methods without domain- or style-specific knowledge.