Creation of topic map by identifying topic chain in chinese

  • Authors:
  • Ching-Long Yeh;Yi-Chun Chen

  • Affiliations:
  • Tatung University, Taipei, Taiwan;Tatung University, Taipei, Taiwan

  • Venue:
  • Proceedings of the 2004 ACM symposium on Document engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML Topic maps enable multiple concurrent views of sets of information objects and can be used to different applications. For example thesaurus-like interfaces to corpora navigational tools for cross-references or citation systems information filtering or delivering depending on user profiles etc. However to enrich the information of a topic map or to connect with some document's URI is very labor-intensive and time-consuming. To solve this problem we propose an approach based on natural language processing techniques to identify and extract useful information in raw Chinese text. Unlike most traditional approaches to parsing sentences based on the integration of complex linguistic information and domain knowledge we work on the output of a part-of-speech tagger and use shallow parsing instead of complex parsing to identify the topics of sentences. The key elements of the centering model of local discourse coherence are employed to extract structures of discourse segments. We use the local discourse structure to solve the problem of zero anaphora in Chinese and then identify the topic which is the most salient element in a sentence. After we obtain all the topics of a document we may assign this document into a topic node of the topic map and add the information of the document into the topic element simultaneously.