Implicit feature identification via hybrid association rule mining

  • Authors:
  • Wei Wang;Hua Xu;Wei Wan

  • Affiliations:
  • State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, ...;State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, ...;State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, ...

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 12.05

Visualization

Abstract

In sentiment analysis, a finer-grained opinion mining method not only focuses on the view of the product itself, but also focuses on product features, which can be a component or attribute of the product. Previous related research mainly relied on explicit features but ignored implicit features. However, the implicit features, which are implied by some words or phrases, are so significant that they can express the users' opinion and help us to better understand the users' comments. It is a big challenge to detect these implicit features in Chinese product reviews, due to the complexity of Chinese. This paper is mainly centered on implicit features identification in Chinese product reviews. A novel hybrid association rule mining method is proposed for this task. The core idea of this approach is mining as many association rules as possible via several complementary algorithms. Firstly, we extract candidate feature indicators based word segmentation, part-of-speech (POS) tagging and feature clustering, then compute the co-occurrence degree between the candidate feature indicators and the feature words using five collocation extraction algorithms. Each indicator and the corresponding feature word constitute a rule (feature indicator - feature word). The best rules in five different rule sets are chosen as the basic rules. Next, three methods are proposed to mine some possible reasonable rules from the lower co-occurrence feature indicators and non indicator words. Finally, the latest rules are used to identify implicit features and the results are compared with the previous. Experiment results demonstrate that our proposed approach is competent at the task, especially via using several expanding methods. The recall is effectively improved, suggesting that the shortcomings of the basic rules have been overcome to certain extent. Besides those high co-occurrence degree indicators, the final rules also contain uncommon rules.