Development of a name translation system using CRAY T94

  • Authors:
  • Wentong Cai;Peng Xu;P. Wu;H. Jyh

  • Affiliations:
  • -;-;-;-

  • Venue:
  • HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Natural language processing (NLP) is an important research direction, since it addresses the needs of the approaching information age. In this paper, we report our study on the problem of translating people's English names into their corresponding Chinese Pinyin names. A name translation system (NTS) has been developed based on statistical approaches. There are two components in the NTS: dictionary creation and name translation. The dictionary is generated using a statistics-based dictionary generator (SBDG), and the name translation is done by using a modified address normalization system (ANS). As in many other NLP applications, the SBDG and ANS suffer the drawback of requiring extremely large computational resources, both in terms of computation time and memory. To make the NTS fast and feasible, therefore, the use of a high-performance computer becomes necessary. The CRAY T94 is a powerful large-scale and general-purpose parallel-vector supercomputer. In this paper, we first describe the system design of the NTS, and then explain how the NTS is optimized to execute on the CRAY T94. The results we obtained are also discussed. Our experience shows that algorithms and data structures are very important in obtaining optimal performance. The performance monitoring/analysis tools provided by The CRAY T94 programming environment are also proved to be very useful in making optimization decisions. In addition, our study also demonstrates that using the CRAY T94, performance improvements can be achieved not only in the traditional areas of scientific computation but also in NLP applications.