External validity of sentiment mining reports: Can current methods identify demographic biases, event biases, and manipulation of reviews?

Authors:
Fons Wijnhoven;Oscar Bloemen
Affiliations:
-;-
Venue:
Decision Support Systems
Year:
2014

Citing 57
Cited 0

User profiling in personalization applications through rule discovery and validation

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Identifying and Filtering Near-Duplicate Documents

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Mining the peanut gallery: opinion extraction and semantic classification of product reviews

WWW '03 Proceedings of the 12th international conference on World Wide Web
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Discovering evolutionary theme patterns from text: an exploration of temporal text mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Identifying events using similarity and context

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Strategic Manipulation of Internet Opinion Forums: Implications for Consumers and Firms

Management Science
Whose thumb is it anyway?: classifying author personality from weblog text

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Automatic Age Estimation Based on Facial Aging Patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence
Opinion spam and analysis

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
A holistic lexicon-based approach to opinion mining

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Social networks, gender, and friending: An analysis of MySpace member profiles

Journal of the American Society for Information Science and Technology
Identifying Events Using Computer-Assisted Text Analysis

Social Science Computer Review
Opinion Mining and Sentiment Analysis

Foundations and Trends in Information Retrieval
Automatically profiling the author of an anonymous text

Communications of the ACM - Inspiring Women in Computing
Context as a dynamic construct

Human-Computer Interaction
Experience Mining: Building a Large-Scale Database of Personal Experiences and Opinions from Web Documents

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
AMAZING: A sentiment mining and retrieval system

Expert Systems with Applications: An International Journal
Sentiment classification of online reviews to travel destinations by supervised machine learning approaches

Expert Systems with Applications: An International Journal
Age differences in online social networking - A study of user profiles and the social capital divide among teenagers and older users in MySpace

Computers in Human Behavior
"OMG, from here, I can see the flames!": a use case of mining location based social networks to acquire spatio-temporal data on forest fires

Proceedings of the 2009 International Workshop on Location Based Social Networks
Data mining emotion in social network communication: Gender differences in MySpace

Journal of the American Society for Information Science and Technology
Learning similarity metrics for event identification in social media

Proceedings of the third ACM international conference on Web search and data mining
Clustering of time series data-a survey

Pattern Recognition
Exploiting social context for review quality prediction

Proceedings of the 19th international conference on World wide web
Detecting experiences from weblogs

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Detecting product review spammers using rating behaviors

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Learning age and gender using co-occurrence of non-dictionary words from stylistic variations

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Abusing social networks for automated user profiling

RAID'10 Proceedings of the 13th international conference on Recent advances in intrusion detection
Opinion Detection in Blogs: What Is Still Missing?

ASONAM '10 Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining
OpinionSeer: Interactive Visualization of Hotel Customer Feedback

IEEE Transactions on Visualization and Computer Graphics
Manipulation in digital word-of-mouth: A reality check for book reviews

Decision Support Systems
Mining comparative opinions from customer reviews for Competitive Intelligence

Decision Support Systems
Mining personal experiences and opinions from Web documents

Web Intelligence and Agent Systems
A design theory for systems that support emergent knowledge processes

MIS Quarterly
Design science in information systems research

MIS Quarterly
Gender attribution: tracing stylometric evidence beyond topic and genre

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
How unique and traceable are usernames?

PETS'11 Proceedings of the 11th international conference on Privacy enhancing technologies
Detection of near-duplicate user generated contents: the SMS spam collection

Proceedings of the 3rd international workshop on Search and mining user-generated contents
Predicting age and gender in online social networks

Proceedings of the 3rd international workshop on Search and mining user-generated contents
Manipulation of online reviews: An analysis of ratings, readability, and sentiments

Decision Support Systems
Learning opinions in user-generated web content

Natural Language Engineering
Interweaving public user profiles on the web

UMAP'10 Proceedings of the 18th international conference on User Modeling, Adaptation, and Personalization
Identifying spam in the iOS app store

Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
Spotting fake reviewer groups in consumer reviews

Proceedings of the 21st international conference on World Wide Web
Discovering collective viewpoints on micro-blogging events based on community and temporal aspects

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Sentiment analysis: what is the end user's requirement?

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Author gender identification from text

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Identifying helpful reviews based on customer's mentions about experiences

Expert Systems with Applications: An International Journal
Are you sure that this happened? assessing the factuality degree of events in text

Computational Linguistics
Review spam detection via temporal pattern discovery

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern for python

The Journal of Machine Learning Research
Extracting social events based on timeline and sentiment analysis in twitter corpus

NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
User demographics and language in an implicit social network

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Modeling Consumer Learning from Online Product Reviews

Marketing Science
Social Media and Firm Equity Value

Information Systems Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many publications in sentiment mining provide new techniques for improved accuracy in extracting features and corresponding sentiments in texts. For the external validity of these sentiment reports, i.e., the applicability of the results to target audiences, it is important to well analyze data of the context of user-generated content and their sample of authors. The literature lacks an analysis of external validity of sentiment mining reports and the sentiment mining field lacks an operationalization of external validity dimensions toward practically useful techniques. From a kernel theory, we identify multiple threats to sentiment mining external validity and study three of them empirically 1) a mismatch in demographics of the reviewers sample, 2) bias due to reviewers' incidental experiences, and 3) manipulation of reviews. The value of external validity threat identifying techniques is next examined in cases from Goodread.com. We conclude that demographic biases can be well detected by current techniques, although we have doubts regarding stylometric techniques for this purpose. We demonstrate the usefulness of event and manipulation bias detection techniques in our cases, but this result needs further replications in more complex and more competitive contexts. Finally, for increasing the decisional usefulness of sentiment mining reports, they should be accompanied by external validity reports and software and service providers in this field should incorporate these in their offerings.