ACM Computing Surveys (CSUR) - The MIT Press scientific computation series
Supporting full-text information retrieval with a persistent object store
EDBT '94 Proceedings of the 4th international conference on extending database technology: Advances in database technology
Incremental updates of inverted lists for text document retrieval
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Inverted File Partitioning Schemes in Multiple Disk Systems
IEEE Transactions on Parallel and Distributed Systems
Dissemination of collection wide information in a distributed information retrieval system
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Serverless network file systems
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Resource scheduling for parallel database and scientific applications
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Compressed inverted files with reduced decoding overheads
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
SPHINX: a framework for creating personal, site-specific Web crawlers
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient distributed algorithms to build inverted files
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Finding replicated Web collections
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
WebBase: a repository of Web pages
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
ACM SIGMETRICS Performance Evaluation Review
Building a distributed full-text index for the Web
Proceedings of the 10th international conference on World Wide Web
ACM Transactions on Internet Technology (TOIT)
Evaluating strategies for similarity search on the web
Proceedings of the 11th international conference on World Wide Web
Database System Implementation
Database System Implementation
Mercator: A scalable, extensible Web crawler
World Wide Web
Query processing and inverted indices in shared: nothing text document information retrieval systems
The VLDB Journal — The International Journal on Very Large Data Bases - Parallelism in database systems
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
An Efficient Indexing Technique for Full Text Databases
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Kqueue - A Generic and Scalable Event Notification Facility
Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference
Estimating frequency of change
ACM Transactions on Internet Technology (TOIT)
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Complex queries over web repositories
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
The Web as a graph: How far we are
ACM Transactions on Internet Technology (TOIT)
A cautious surfer for PageRank
Proceedings of the 16th international conference on World Wide Web
Measuring similarity to detect qualified links
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
IRLbot: scaling to 6 billion pages and beyond
Proceedings of the 17th international conference on World Wide Web
Investigating web services on the world wide web
Proceedings of the 17th international conference on World Wide Web
Towards breaking the quality curse.: a web-querying approach to web people search.
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Separate and inequal: preserving heterogeneity in topical authority flows
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
SpotSigs: robust and efficient near duplicate detection in large web collections
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Classifiers without borders: incorporating fielded text from neighboring web pages
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
IRLbot: Scaling to 6 billion pages and beyond
ACM Transactions on the Web (TWEB)
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
From whence does your authority come?: utilizing community relevance in ranking
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Computing strongly connected components in the streaming model
TAPAS'11 Proceedings of the First international ICST conference on Theory and practice of algorithms in (computer) systems
Bridging link and query intent to enhance web search
Proceedings of the 22nd ACM conference on Hypertext and hypermedia
A scalable eigensolver for large scale-free graphs using 2D graph partitioning
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Exploiting Web querying for Web people search
ACM Transactions on Database Systems (TODS)
Mining anchor text trends for retrieval
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Exploring temporal evidence in web information retrieval
FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
Hi-index | 0.00 |
We describe the design and performance of WebBase, a tool for Web research. The system includes a highly customizable crawler, a repository for collected Web pages, an indexer for both text and link-related page features, and a high-speed content distribution facility. The distribution module enables researchers world-wide to retrieve pages from WebBase, and stream them across the Internet at high speed. The advantage for the researchers is that they need not all crawl the Web before beginning their research. WebBase has been used by scores of research and teaching organizations world-wide, mostly for investigations into Web topology and linguistic content analysis. After describing the system's architecture, we explain our engineering decisions for each of the WebBase components, and present respective performance measurements.