Opinion mining from noisy text data

Authors:
Lipika Dey;S K Mirajul Haque
Affiliations:
TCS Innovation Lab Delhi, Udyog Vihar, Gurgaon, India;TCS Innovation Lab Delhi, Udyog Vihar, Gurgaon, India
Venue:
Proceedings of the second workshop on Analytics for noisy unstructured text data
Year:
2008

Citing 4
Cited 6

Mining and summarizing customer reviews

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting product features and opinions from reviews

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Mining opinion features in customer reviews

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

Studying the effects of noisy text on text mining applications

Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Mining opinions from messenger

Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Clustering based approach to learning regular expressions over large alphabet for noisy unstructured text

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Mining reputation of person/product from comment and reply on UCC/internet article

ICIC'10 Proceedings of the 6th international conference on Advanced intelligent computing theories and applications: intelligent computing
Building reputation and trust using federated search and opinion mining

Proceedings of the 21st international conference companion on World Wide Web
Building reputation and trust using federated search and opinion mining

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The proliferation of Internet has not only generated huge volumes of unstructured information in the form of web documents, but a large amount of text is also generated in the form of emails, blogs, and feedbacks etc. The data generated from online communication acts as potential gold mines for discovering knowledge. Text analytics has matured and is being successfully employed to mine important information from unstructured text documents. Most of these techniques use Natural Language Processing techniques which assume that the underlying text is clean and correct. Statistical techniques, though not as accurate as linguistic mechanisms, are also employed for the purpose to overcome the dependence on clean text. The chief bottleneck for designing statistical mechanisms is however its dependence on appropriately annotated training data. None of these methodologies are suitable for mining information from online communication text data due to the fact that they are often noisy. These texts are informally written. They suffer from spelling mistakes, grammatical errors, improper punctuation and irrational capitalization. This paper focuses on opinion extraction from noisy text data. It is aimed at extracting and consolidating opinions of customers from blogs and feedbacks, at multiple levels of granularity. Ours is a hybrid approach, in which we initially employ a semi-supervised method to learn domain knowledge from a training repository which contains both noisy and clean text. Thereafter we employ localized linguistic techniques to extract opinion expressions from noisy text. We have developed a system based on this approach, which provides the user with a platform to analyze opinion expressions extracted from a repository.