Automatic construction of cross-lingual networks of concepts from the Hong Kong SAR police department

Authors:
Kar Wing Li;Christopher C. Yang
Affiliations:
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong;Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong
Venue:
ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
Year:
2003

Citing 15
Cited 2

Building expert systems

Building expert systems
A system based on associational logic for the interrogation of databases

Journal of Information Science
Automatic text processing

Automatic text processing
Effectiveness of query expansion in ranked-output document retrieval systems

Journal of Information Science
A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project

IEEE Transactions on Pattern Analysis and Machine Intelligence
A survey of multilingual text retrieval

A survey of multilingual text retrieval
A concept space approach to addressing the vocabulary problem in scientific information retrieval: an experiment on the worm community system

Journal of the American Society for Information Science
PAT-tree-based keyword extraction for Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Combination and boundary detection approaches on Chinese indexing

Journal of the American Society for Information Science - Special topic issue on digital libraries: part 2
Translingual alteration of conceptual information in medical translation: a crosslanguage analysis between English and Chinese

Journal of the American Society for Information Science
A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

Machine Translation
Information architecture for bilingual web sites

Journal of the American Society for Information Science and Technology
Automatic construction of English/Chinese parallel corpora

Journal of the American Society for Information Science and Technology
Mining the Web for bilingual text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An automatic indexing and neural network approach to concept retrieval and classification of multilingual (Chinese-English) documents

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

An associate constraint network approach to extract multi-lingual information for crime analysis

Decision Support Systems
Cross-lingual thesaurus for multilingual knowledge management

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The tragic event of September 11 has prompted the rapid growth of attention of national security and criminal analysis. In the national security world, very large volumes of data and information are generated and gathered. Much of this data and information written in different languages and stored in different locations may be seemingly unconnected. Therefore, cross-lingual semantic interoperability is a major challenge to generate an overview of this disparate data and information so that it can be analysed, searched. The traditional information retrieval (IR) approaches normally require a document to share some keywords with the query. In reality, the users may use some keywords that are different from what used in the documents. There are then two different term spaces, one for the users, and another for the documents. The problem can be viewed as the creation of a thesaurus. The creation of such relationships would allow the system to match queries with relevant documents, even though they contain different terms. Apart from this, terrorists and criminals may communicate through letters, e-mails and faxes in languages other than English. The translation ambiguity significantly exacerbates the retrieval problem. To facilitate cross-lingual information retrieval, a corpusbased approach uses the term co-occurrence statistics in parallel or comparable corpora to construct a statistical translation model to cross the language boundary. However, collecting parallel corpora between European language and Oriental language is not an easy task due to the unique linguistics and grammar structures of oriental languages. In this paper, the text-based approach to align English/Chinese Hong Kong Police press release documents from the Web is first presented. This article then reports an algorithmic approach to generate a robust knowledge base based on statistical correlation analysis of the semantics (knowledge) embedded in the bilingual press release corpus. The research output consisted of a thesaurus-like, semantic network knowledge base, which can aid in semantics-based cross-lingual information management and retrieval.