Learning concepts from text based on the inner-constructive model

Authors:
Shi Wang;Yanan Cao;Xinyu Cao;Cungen Cao
Affiliations:
Graduate University of Chinese Academy of Sciences, Beijing, China and Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing ...;Graduate University of Chinese Academy of Sciences, Beijing, China and Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing ...;Graduate University of Chinese Academy of Sciences, Beijing, China and Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing ...;Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Venue:
KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Year:
2007

Citing 4
Cited 2

Using text processing techniques to automatically enrich a domain ontology

Proceedings of the international conference on Formal Ontology in Information Systems - Volume 2001
Progress in the development of national knowledge infrastructure

Journal of Computer Science and Technology
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Hownet And the Computation of Meaning

Hownet And the Computation of Meaning

Learning Hierarchical Lexical Hyponymy

International Journal of Cognitive Informatics and Natural Intelligence
Semantic separator learning and its applications in unsupervised Chinese text parsing

Frontiers of Computer Science: Selected Publications from Chinese Universities

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new model for automatic acquisition of lexical concepts from text, referred to as Concept Inner-Constructive Model (CICM). The CICM clarifies the rules when words construct concepts through four aspects including (1) parts of speech, (2) syllable, (3) senses and (4) attributes. Firstly, we extract a large number of candidate concepts using lexico-patterns and confirm a part of them to be concepts if they matched enough patterns for some times. Then we learn CICMs using the confirmed concepts automatically and distinguish more concepts with the model. Essentially, the CICM is an instances learning model but it differs from most existing models in that it takes into account a variety of linguistic features and statistical features of words as well. And for more effective analogy when learning new concepts using CICMs, we cluster similar words based on density. The effectiveness of our method has been evaluated on a 160G raw corpus and 5,344,982 concepts are extracted with a precision of 89.11% and a recall of 84.23%.