Sample cutting method for imbalanced text sentiment classification based on BRC

  • Authors:
  • Suge Wang;Deyu Li;Lidong Zhao;Jiahao Zhang

  • Affiliations:
  • School of Computer and Information Technology, Shanxi University, Taiyuan, 030006 Shanxi, China and Key Laboratory of Computational Intelligence, and Chinese Information Processing of Ministry, of ...;School of Computer and Information Technology, Shanxi University, Taiyuan, 030006 Shanxi, China and Key Laboratory of Computational Intelligence, and Chinese Information Processing of Ministry, of ...;School of Computer and Information Technology, Shanxi University, Taiyuan, 030006 Shanxi, China;School of Mathematics Science, Shanxi University, Taiyuan, 030006 Shanxi, China

  • Venue:
  • Knowledge-Based Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The vast subjective texts spreading all over the Internet promoted the demand for text sentiment classification technology. A well-known fact that often weakens the performance of classifiers is the distribution imbalance of review texts on the positive-negative classes. In this paper, we pay attention to the sentiment classification problem of imbalanced text sets. With regards to this problem, the algorithm BRC for clarifying the disorder boundary is proposed by cutting the majority class samples in the dense boundary region. The classifier is constructed based on Support Vector Machine. In order to find the better feature weight scheme, combination strategy of sample cutting, and parameters in BRC, three groups of experiments are designed on six text sets about five domains. The experimental results show that the feature weight scheme Presence has the best performance. And the combination strategy BRC+RS can give a tradeoff between the evaluation measures, Precision and Recall on two categories and make the synthetical evaluation measure Accuracy obtain a larger increase. It should be noted that the method of determining the parameters @a and @b in BRC is empirical. Although the boundary region cutting algorithm BRC is aimed to text sentiment classification we believe that it is also suitable to any two-category classification problem with imbalanced sample data.