Automatic sentiment analysis using the textual pattern content similarity in natural language

Authors:
Jan Žižka;Frantiýek Dařena
Affiliations:
Department of Informatics, SoNet Research Center, Faculty of Business and Economics, Mendel University in Brno, Brno, Czech Republic;Department of Informatics, SoNet Research Center, Faculty of Business and Economics, Mendel University in Brno, Brno, Czech Republic
Venue:
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Year:
2010

Citing 3
Cited 1

Mining and summarizing customer reviews

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Mining: Classification, Clustering, and Applications

Text Mining: Classification, Clustering, and Applications
Selecting interesting articles using their similarity based only on positive examples

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Mining significant words from customer opinions written in different natural languages

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper investigates a problem connected with automatic analysis of sentiment (opinion) in textual natural-language documents. The initial situation works on the assumption that a user has many documents centered around a certain topic with different opinions of it. The user wants to pick out only relevant documents that represent a certain sentiment - for example, only positive reviews of a certain subject. Having not too many typical patterns of the desired document type, the user needs a tool that can collect documents which are similar to the patterns. The suggested procedure is based on computing the similarity degree between patterns and unlabeled documents, which are then ranked according to their similarity to the patterns. The similarity is calculated as a distance between patterns and unlabeled items. The results are shown for publicly accessible downloaded real-world data in two languages, English and Czech.