An improved fast algorithm of frequent string extracting with no thesaurus

  • Authors:
  • Yumeng Zhang;Chuanhan Liu

  • Affiliations:
  • Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China and School of Business, Ningbo University, Ningbo, China;Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

  • Venue:
  • MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Unlisted word identification is the hotspot in the research of Chinese information processing. String frequency statistics is a simple and effective method of extraction unlisted word. Existing algorithm cannot meet the requirement of high speed in vast text processing system. According to strategies of string length increasing and level-wise scanning, this paper presents a fast algorithm of extracting frequent strings and improves string frequency statistical method. The approach does not need thesaurus, and does not need to word segmentation, but according to the average mutual information to identify whether each frequent string is a word. Compared with previous approaches, experiments show that the algorithm gains advantages such as high speed, high accuracy of 91% and above.