Extracting chinese product features: representing a sequence by a set of skip-bigrams

  • Authors:
  • Ge Xu;Chu-Ren Huang;Houfeng Wang

  • Affiliations:
  • Faculty of Humanities, The Hong Kong Polytechnic University, Hong Kong, Institute of Computational Linguistics, Peking University, Beijing, China, Department of Computer Science, MinJiang Universi ...;Faculty of Humanities, The Hong Kong Polytechnic University, Hong Kong;Institute of Computational Linguistics, Peking University, Beijing, China

  • Venue:
  • CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

A skip-bigram is a bigram that allows skips between words. In this paper, we use a set of skip bigrams (a SBGSet) to represent a short word sequence, which is the typical form of a product feature. The advantage of SBGSet representation for word sequences is that we can convert between a sequence and a set. Under the SBGSet representation we can employ association rule mining to find frequent itemsets from which frequent product features can be extracted. For infrequent product features, we use a pattern-based method to extract them. A pattern is also represented by a SBGSet, and contains a variable that can be instantiated to a product feature. We use two data sets to evaluate our method. The experimental result shows that our method is suitable for extracting Chinese product features, and the pattern-based method to extract infrequent product features is effective.