VN-KIM IE: automatic extraction of Vietnamese named-entities on the web

  • Authors:
  • Truc-Vien T. Nguyen;Tru H. Cao

  • Affiliations:
  • Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology, Vietnam;Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology, Vietnam

  • Venue:
  • New Generation Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The most fascinating advantage of the semantic web would be its capabiility of understanding and processing the contents of web pages automatically. Basically, the semantic web realization involves two main tasks: (1) Representation and management of a large amount of data and metadata for web contents; (2) Information extraction and annotation on web pages. On the one hand, recognition of named-entities is regarded as a basic and important problem to be solved, before deeper semantics of a web page could be extracted. On the other hand, semantic web information extraction is a language-dependent problem, which requires particular natural language processing techniques. This paper introduces VN-KIM IE, the information extraction module of the semantic web system VN-KIM that we have developed. The function of VN-KIM IE is to automatically recognize named-entities in Vietnamese web pages, by identifying their classes, and addresses if existing, in the knowledge base of discourse. That information is then annotated to those web pages, providing a basis for NE-based searching on them, as compared to the current keyword-based one. The design, implementation, and performance of VN-KIM IE are presented and discussed.