An automatic indexing and neural network approach to concept retrieval and classification of multilingual (Chinese-English) documents

Authors:
Chung-Hsin Lin;Hsinchun Chen
Affiliations:
Dept. of Manage. Inf. Syst., Arizona Univ., Tucson, AZ;-
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Year:
1996

Citing 0
Cited 11

Concept-based knowledge discovery in texts extracted from the Web

ACM SIGKDD Explorations Newsletter
Intelligent system design using artificial neural networks for automatic semantic units extraction in medical literature

Second international workshop on Intelligent systems design and application
Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws

Journal of the American Society for Information Science and Technology
An intelligent web-page classifier with fair feature-subset selection

Engineering Applications of Artificial Intelligence
Editors' introduction special issue on multilingual knowledge management

Decision Support Systems
Two novel feature selection approaches for web page classification

Expert Systems with Applications: An International Journal
Automatic classification of Tamil documents using vector space model and artificial neural network

Expert Systems with Applications: An International Journal
Estimating the size and evolution of categorised topics in web directories

Web Intelligence and Agent Systems
Research on English-Chinese bi-directional cross-language information retrieval

Proceedings of the 2005 joint Chinese-German conference on Cognitive systems
Automatic construction of cross-lingual networks of concepts from the Hong Kong SAR police department

ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
An intelligent information system for detecting web commerce transactions

AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

An automatic indexing and concept classification approach to a multilingual (Chinese and English) bibliographic database is presented. We introduced a multi-linear term-phrasing technique to extract concept descriptors (terms or keywords) from a Chinese-English bibliographic database. A concept space of related descriptors was then generated using a co-occurrence analysis technique. Like a man-made thesaurus, the system-generated concept space can be used to generate additional semantically-relevant terms for search. For concept classification and clustering, a variant of a Hopfield neural network was developed to cluster similar concept descriptors and to generate a small number of concept groups to represent (summarize) the subject matter of the database. The concept space approach to information classification and retrieval has been adopted by the authors in other scientific databases and business applications, but multilingual information retrieval presents a unique challenge. This research reports our experiment on multilingual databases. Our system was initially developed in the MS-DOS environment, running ETEN Chinese operating system. For performance reasons, it was then tested on a UNIX-based system. Due to the unique ideographic nature of the Chinese language, a Chinese term-phrase indexing paradigm considering the ideographic characteristics of Chinese was developed as a multilingual information classification model. By applying the neural network based concept classification technique, the model presents a novel way of organizing unstructured multilingual information