Chinese new word detection from query logs

  • Authors:
  • Yan Zhang;Maosong Sun;Yang Zhang

  • Affiliations:
  • State Key Laboratory on Intelligent Technology and Systems Technology, Deptment of Computer Science and Technology, Tsinghua University, Beijing, China;State Key Laboratory on Intelligent Technology and Systems Technology, Deptment of Computer Science and Technology, Tsinghua University, Beijing, China;Sohu Inc. R&D Center, Beijing, China

  • Venue:
  • ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Existing works in literature mostly resort to the web pages or other author-centric resources to detect new words, which require highly complex text processing. This paper exploits the visitor-centric resources, specifically, query logs from the commercial search engine, to detect new words. Since query logs are generated by the search engine users, and are segmented naturally, the complex text processing work can be avoided. By dynamic time warping, a new word detection algorithm based on the trajectory similarity is proposed to distinguish new words from the query logs. Experiments based on real world data sets show the effectiveness and efficiency of the proposed algorithm.