Improving text segmentation with non-systematic semantic relation

  • Authors:
  • Viet Cuong Nguyen;Le Minh Nguyen;Akira Shimazu

  • Affiliations:
  • School of Information Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan;School of Information Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan;School of Information Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan

  • Venue:
  • CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text segmentation is a fundamental problem in natural language processing, which has application in information retrieval, question answering, and text summarization. Almost previous works on unsupervised text segmentation are based on the assumption of lexical cohesion, which is indicated by relations between words in the two units of text. However, they only take into account the reiteration, which is a category of lexical cohesion, such as word repetition, synonym or superordinate. In this research, we investigate the non-systematic semantic relation, which is classified as collocation in lexical cohesion. This relation holds between two words or phrases in a discourse when they pertain to a particular theme or topic. This relation has been recognized via a topic model, which is, in turn, acquired from a large collection of texts. The experimental results on the public dataset show the advantages of our approach in comparison to the available unsupervised approaches.