WordNet: a lexical database for English
Communications of the ACM
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Lydia: a system for the large scale analysis of natural language text
Lydia: a system for the large scale analysis of natural language text
Identifying Differences in News Coverage between Cultural/Ethnic Groups
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Improving Movie Gross Prediction through News Analysis
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Lydia: a system for large-scale news analysis
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Learning readers' news preferences with support vector machines
ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part II
Automatic discovery of patterns in media content
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Watch the Story Unfold with TextWheel: Visualization of Large-Scale News Streams
ACM Transactions on Intelligent Systems and Technology (TIST)
Hi-index | 0.00 |
The social sciences strive to understand the political, social, and cultural world around us, but have been impaired by limited access to the quantitative data sources enjoyed by the hard sciences. Careful analysis of Web document streams holds enormous potential to solve longstanding problems in a variety of social science disciplines through massive data analysis. This paper introduces the TextMap Access system, which provides ready access to a wealth of interesting statistics on millions of people, places, and things across a number of interesting web corpora. Powered by a flexible and scalable distributed statistics computation framework using Hadoop, continually updated corpora include newspapers, blogs, patent records, legal documents, and scientific abstracts; well over a terabyte of raw text and growing daily. The Lydia Textmap Access system, available through http://www.textmap.com/access, provides instant access for students and scholars through a convenient web user-interface. We describe the architecture of the TextMap Access system, and its impact on current research in political science, sociology, and business/marketing.