How to build a WebFountain: An architecture for very large-scale text analytics

Authors:
D. Gruhl;L. Chavet;D. Gibson;J. Meyer;P. Pattanayak;A. Tomkins;J. Zien
Affiliations:
IBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, California 95120;IBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, California 95120;IBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, California 95120;IBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, California 95120;IBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, California 95120;IBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, California 95120;IBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, California 95120
Venue:
IBM Systems Journal
Year:
2004

Citing 13
Cited 32

A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Applications of a Web query language

Selected papers from the sixth international conference on World Wide Web
The connectivity server: fast access to linkage information on the Web

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Squeal: a structured query language for the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
WebBase: a repository of Web pages

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty

ACM Computing Surveys (CSUR)
Vinci: a service-oriented architecture for rapid development of web applications

Proceedings of the 10th international conference on World Wide Web
News analysis: IBM sets its sights on "autonomic computing"

IEEE Spectrum - Critical challenges 2002
Efficient Queries over Web Views

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Algorithmic aspects of information retrieval on the web

Handbook of massive data sets
A Framework for Representing Knowledge

A Framework for Representing Knowledge
The search for meaning in large text databases

The search for meaning in large text databases

Fast discovery of connection subgraphs

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Designing interfaces for guided collection of knowledge about everyday objects from volunteers

Proceedings of the 10th international conference on Intelligent user interfaces
Sentiment Mining in WebFountain

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Sampling search-engine results

WWW '05 Proceedings of the 14th international conference on World Wide Web
Multi-structural databases

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The predictive power of online chatter

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Discovering large dense subgraphs in massive graphs

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient implementation of large-scale multi-structural databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A case study on alternate representations of data structures in XML

Proceedings of the 2005 ACM symposium on Document engineering
The web beyond popularity: a really simple system for web scale RSS

Proceedings of the 15th international conference on World Wide Web
Effective web-scale crawling through website analysis

Proceedings of the 15th international conference on World Wide Web
Systems research challenges: a scale-out perspective

IBM Journal of Research and Development
Supporting Colocated Interactions Using RFID and Social Network Displays

IEEE Pervasive Computing
Managing usability for people with disabilities in a large web presence

IBM Systems Journal
HIS-KCWater: context-aware geospatial data and service integration

Proceedings of the 2007 ACM symposium on Applied computing
Towards Automated Reputation and Brand Monitoring on the Web

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Scalability of the Nutch search engine

Proceedings of the 21st annual international conference on Supercomputing
Building structured web community portals: a top-down, compositional, and incremental approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Bibliometrics to webometrics

Journal of Information Science
Relaxation in text search using taxonomies

Proceedings of the VLDB Endowment
High-performance information extraction with AliBaba

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Web Crawling

Foundations and Trends in Information Retrieval
Object search: supporting structured queries in web search engines

SS '10 Proceedings of the NAACL HLT 2010 Workshop on Semantic Search
Sentiment in Twitter events

Journal of the American Society for Information Science and Technology
Information technology for healthcare transformation

IBM Journal of Research and Development
Towards a killer app for the semantic web

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Large visualizations for system monitoring of complex, heterogeneous systems

INTERACT'05 Proceedings of the 2005 IFIP TC13 international conference on Human-Computer Interaction
The 5w structure for sentiment summarization-visualization-tracking

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Sentiment analysis: what is the end user's requirement?

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Information Retrieval on the Blogosphere

Foundations and Trends in Information Retrieval
OXPath: A language for scalable data extraction, automation, and crawling on the deep web

The VLDB Journal — The International Journal on Very Large Data Bases
Information extraction as a filtering task

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

WebFountain is a platform for very large-scale text analytics applications. The platform allows uniform access to a wide variety of sources, scalable system-managed deployment of a variety of document-level "augmenters" and corpus-level "miners," and finally creation of an extensible set of hosted Web services containing information that drives end-user applications. Analytical components can be authored remotely by partners using a collection of Web service APIs (application programming interfaces). The system is operational and supports live customers. This paper surveys the high-level decisions made in creating such a system.