Forecasting with twitter data

Authors:
Marta Arias;Argimiro Arratia;Ramon Xuriguera
Affiliations:
Universitat Politècnica de Catalunya, Barcelona, Spain;Universitat Politècnica de Catalunya, Barcelona, Spain;Universitat Politècnica de Catalunya, Barcelona, Spain
Venue:
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
Year:
2014

Citing 13
Cited 0

Machine Learning

Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
The predictive power of online chatter

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Prediction, Learning, and Games

Prediction, Learning, and Games
Opinion Mining and Sentiment Analysis

Foundations and Trends in Information Retrieval
Issues in evaluation of stream learning algorithms

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Ensemble-based classifiers

Artificial Intelligence Review
MOA: Massive Online Analysis

The Journal of Machine Learning Research
The Edinburgh Twitter corpus

WSA '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media
Flu detector: tracking epidemics on twitter

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Predicting the Future with Social Media

WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Sentiment knowledge discovery in twitter streaming data

DS'10 Proceedings of the 13th international conference on Discovery science
Crowd-powered TV viewing rates: measuring relevancy between tweets and TV programs

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The dramatic rise in the use of social network platforms such as Facebook or Twitter has resulted in the availability of vast and growing user-contributed repositories of data. Exploiting this data by extracting useful information from it has become a great challenge in data mining and knowledge discovery. A recently popular way of extracting useful information from social network platforms is to build indicators, often in the form of a time series, of general public mood by means of sentiment analysis. Such indicators have been shown to correlate with a diverse variety of phenomena. In this article we follow this line of work and set out to assess, in a rigorous manner, whether a public sentiment indicator extracted from daily Twitter messages can indeed improve the forecasting of social, economic, or commercial indicators. To this end we have collected and processed a large amount of Twitter posts from March 2011 to the present date for two very different domains: stock market and movie box office revenue. For each of these domains, we build and evaluate forecasting models for several target time series both using and ignoring the Twitter-related data. If Twitter does help, then this should be reflected in the fact that the predictions of models that use Twitter-related data are better than the models that do not use this data. By systematically varying the models that we use and their parameters, together with other tuning factors such as lag or the way in which we build our Twitter sentiment index, we obtain a large dataset that allows us to test our hypothesis under different experimental conditions. Using a novel decision-tree-based technique that we call summary tree we are able to mine this large dataset and obtain automatically those configurations that lead to an improvement in the prediction power of our forecasting models. As a general result, we have seen that nonlinear models do take advantage of Twitter data when forecasting trends in volatility indices, while linear ones fail systematically when forecasting any kind of financial time series. In the case of predicting box office revenue trend, it is support vector machines that make best use of Twitter data. In addition, we conduct statistical tests to determine the relation between our Twitter time series and the different target time series.