An empirical study of massively parallel bayesian networks learning for sentiment extraction from unstructured text

Authors:
Wei Chen;Lang Zong;Weijing Huang;Gaoyan Ou;Yue Wang;Dongqing Yang
Affiliations:
Key Laboratory of High Confidence Software Technologies, Ministry of Education, School of EECS, Peking University, Beijing, China;Key Laboratory of High Confidence Software Technologies, Ministry of Education, School of EECS, Peking University, Beijing, China;Key Laboratory of High Confidence Software Technologies, Ministry of Education, School of EECS, Peking University, Beijing, China;Key Laboratory of High Confidence Software Technologies, Ministry of Education, School of EECS, Peking University, Beijing, China;Key Laboratory of High Confidence Software Technologies, Ministry of Education, School of EECS, Peking University, Beijing, China;Key Laboratory of High Confidence Software Technologies, Ministry of Education, School of EECS, Peking University, Beijing, China
Venue:
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Year:
2011

Citing 9
Cited 0

Learning Bayesian networks from data: an information-theory based approach

Artificial Intelligence
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Determining the sentiment of opinions

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Identifying sources of opinions with conditional random fields and extraction patterns

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Graph Twiddling in a MapReduce World

Computing in Science and Engineering
Sentiment retrieval using generative models

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
PLANET: massively parallel learning of tree ensembles with MapReduce

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.01

Visualization

Abstract

Extracting sentiments from unstructured text has emerged as an important problem in many disciplines, for example, to mine online opinions from the Internet. Many algorithms have been applied to solve this problem. Most of them fail to handle the large scale web data. In this paper, we present a parallel algorithm for BN (Bayesian Networks) structure leaning from large-scale dateset by using a MapReduce cluster. Then, we apply this parallel BN learning algorithm to capture the dependencies among words, and, at the same time, finds a vocabulary that is efficient for the purpose of extracting sentiments. The benefits of using MapReduce for BN structure learning are discussed. The performance of using BN to extract sentiments is demonstrated by applying it to real web blog data. Experimental results on the web data set show that our algorithm is able to select a parsimonious feature set with substantially fewer predictor variables than in the full data set and leads to better predictions about sentiment orientations than several usually used methods.