Principled design of the modern Web architecture
ACM Transactions on Internet Technology (TOIT)
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
UIMA GRID: Distributed Large-scale Text Analysis
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Distributed, large-scale latent semantic analysis by index interpolation
Proceedings of the 3rd international conference on Scalable information systems
Anasazi software for the numerical solution of large-scale eigenvalue problems
ACM Transactions on Mathematical Software (TOMS)
Critiquing text analysis in social modeling: best practices, limitations, and new frontiers
SBP'13 Proceedings of the 6th international conference on Social Computing, Behavioral-Cultural Modeling and Prediction
Hi-index | 0.00 |
Automated analysis of unstructured text documents (e.g., web pages, newswire articles, research publications, business reports) is a key capability for solving important problems in areas including decision making, risk assessment, social network analysis, intelligence analysis, scholarly research and others. However, as data sizes continue to grow in these areas, scalable processing, modeling, and semantic analysis of text collections becomes essential. In this paper, we present the ParaText text analysis engine, a distributed memory software framework for processing, modeling, and analyzing collections of unstructured text documents. Results on several document collections using hundreds of processors are presented to illustrate the flexibility, extensibility, and scalability of the the entire process of text modeling from raw data ingestion to application analysis.