A search engine for natural language applications

Authors:
Michael J. Cafarella;Oren Etzioni
Affiliations:
University of Washington, Seattle, WA;University of Washington, Seattle, WA
Venue:
WWW '05 Proceedings of the 14th international conference on World Wide Web
Year:
2005

Citing 18
Cited 31

Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Question-answering by predictive annotation

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Squeal: a structured query language for the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Scaling question answering to the Web

Proceedings of the 10th international conference on World Wide Web
Searching with numbers

Proceedings of the 11th international conference on World Wide Web
Optimised phrase querying and browsing of large text databases

ACSC '01 Proceedings of the 24th Australasian conference on Computer science
Modern Information Retrieval

Modern Information Retrieval
Efficient phrase querying with an auxiliary index

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A Fast Regular Expression Indexing Engine

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Corpus-Based Schema Matching

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised named-entity extraction from the web: an experimental study

Artificial Intelligence
An analysis of the AskMSR question-answering system

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Lightweight structured text processing

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Corpus-based knowledge representation

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

POLYPHONET: an advanced social network extraction system from the web

Proceedings of the 15th international conference on World Wide Web
Optimizing scoring functions and indexes for proximity search in type-annotated corpora

Proceedings of the 15th international conference on World Wide Web
To search or to crawl?: towards a query optimizer for text-centric tasks

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
KnowItNow: fast, scalable information extraction from the web

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Libra: a library operating system for a jvm in a virtualized execution environment

Proceedings of the 3rd international conference on Virtual execution environments
Multilingual phrase-based concordance generation in real-time

Information Retrieval
Scalability of the Nutch search engine

Proceedings of the 21st annual international conference on Supercomputing
Towards a query optimizer for text-centric tasks

ACM Transactions on Database Systems (TODS)
POLYPHONET: An advanced social network extraction system from the Web

Web Semantics: Science, Services and Agents on the World Wide Web
EntityRank: searching entities directly and holistically

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Entity categorization over large document collections

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Enriching the class diagram concepts to capture natural language semantics for database access

Data & Knowledge Engineering
Ontology-driven, unsupervised instance population

Web Semantics: Science, Services and Agents on the World Wide Web
Scalable ad-hoc entity extraction from text collections

Proceedings of the VLDB Endowment
Information Extraction

Foundations and Trends in Databases
On the use of negation in Boolean IR queries

Information Processing and Management: an International Journal
A quality-aware optimizer for information extraction

ACM Transactions on Database Systems (TODS)
A linguistic knowledge discovery tool: very large ngram database search with arbitrary wildcards

COLING '08 22nd International Conference on on Computational Linguistics: Demonstration Papers
Graph-based word clustering using a web search engine

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
BE: a search engine for NLP research

WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus
Data-oriented content query system: searching for data into text on the web

Proceedings of the third ACM international conference on Web search and data mining
Beyond pages: supporting efficient, scalable entity search with dual-inversion index

Proceedings of the 13th International Conference on Extending Database Technology
DoCQS: a prototype system for supporting data-oriented content query

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Index structures for efficiently searching natural language text

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Piggyback: using search engines for robust cross-domain named entity recognition

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Ontology-driven information extraction with ontosyphon

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Cloud as virtual databases: bridging private databases and web services

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Retrieving customary web language to assist writers

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Chapter 7: dataspaces

Search Computing
A sensemaking environment for literature study

CHI '12 Extended Abstracts on Human Factors in Computing Systems
Detecting sensitive information from textual documents: an information-theoretic approach

MDAI'12 Proceedings of the 9th international conference on Modeling Decisions for Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many modern natural language-processing applications utilize search engines to locate large numbers of Web documents or to compute statistics over the Web corpus. Yet Web search engines are designed and optimized for simple human queries---they are not well suited to support such applications. As a result, these applications are forced to issue millions of successive queries resulting in unnecessary search engine load and in slow applications with limited scalability.In response, this paper introduces the Bindings Engine (BE), which supports queries containing typed variables and string-processing functions. For example, in response to the query "powerful ‹noun›" BE will return all the nouns in its index that immediately follow the word "powerful", sorted by frequency. In response to the query "Cities such as ProperNoun(Head(‹NounPhrase›))", BE will return a list of proper nouns likely to be city names.BE's novel neighborhood index enables it to do so with O(k) random disk seeks and O(k) serial disk reads, where k is the number of non-variable terms in its query. As a result, BE can yield several orders of magnitude speedup for large-scale language-processing applications. The main cost is a modest increase in space to store the index. We report on experiments validating these claims, and analyze how BE's space-time tradeoff scales with the size of its index and the number of variable types. Finally, we describe how a BE-based application extracts thousands of facts from the Web at interactive speeds in response to simple user queries.