Some advances in transformation-based part of speech tagging
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Question-answering by predictive annotation
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Squeal: a structured query language for the Web
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Scaling question answering to the Web
Proceedings of the 10th international conference on World Wide Web
Proceedings of the 11th international conference on World Wide Web
Optimised phrase querying and browsing of large text databases
ACSC '01 Proceedings of the 24th Australasian conference on Computer science
Modern Information Retrieval
Efficient phrase querying with an auxiliary index
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A Fast Regular Expression Indexing Engine
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Web-scale information extraction in knowitall: (preliminary results)
Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised named-entity extraction from the web: an experimental study
Artificial Intelligence
An analysis of the AskMSR question-answering system
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Lightweight structured text processing
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Corpus-based knowledge representation
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
POLYPHONET: an advanced social network extraction system from the web
Proceedings of the 15th international conference on World Wide Web
Optimizing scoring functions and indexes for proximity search in type-annotated corpora
Proceedings of the 15th international conference on World Wide Web
To search or to crawl?: towards a query optimizer for text-centric tasks
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
KnowItNow: fast, scalable information extraction from the web
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Libra: a library operating system for a jvm in a virtualized execution environment
Proceedings of the 3rd international conference on Virtual execution environments
Multilingual phrase-based concordance generation in real-time
Information Retrieval
Scalability of the Nutch search engine
Proceedings of the 21st annual international conference on Supercomputing
Towards a query optimizer for text-centric tasks
ACM Transactions on Database Systems (TODS)
POLYPHONET: An advanced social network extraction system from the Web
Web Semantics: Science, Services and Agents on the World Wide Web
EntityRank: searching entities directly and holistically
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Entity categorization over large document collections
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Enriching the class diagram concepts to capture natural language semantics for database access
Data & Knowledge Engineering
Ontology-driven, unsupervised instance population
Web Semantics: Science, Services and Agents on the World Wide Web
Scalable ad-hoc entity extraction from text collections
Proceedings of the VLDB Endowment
Foundations and Trends in Databases
On the use of negation in Boolean IR queries
Information Processing and Management: an International Journal
A quality-aware optimizer for information extraction
ACM Transactions on Database Systems (TODS)
A linguistic knowledge discovery tool: very large ngram database search with arbitrary wildcards
COLING '08 22nd International Conference on on Computational Linguistics: Demonstration Papers
Graph-based word clustering using a web search engine
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
BE: a search engine for NLP research
WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus
Data-oriented content query system: searching for data into text on the web
Proceedings of the third ACM international conference on Web search and data mining
Beyond pages: supporting efficient, scalable entity search with dual-inversion index
Proceedings of the 13th International Conference on Extending Database Technology
DoCQS: a prototype system for supporting data-oriented content query
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Index structures for efficiently searching natural language text
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Piggyback: using search engines for robust cross-domain named entity recognition
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Ontology-driven information extraction with ontosyphon
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Cloud as virtual databases: bridging private databases and web services
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Retrieving customary web language to assist writers
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Search Computing
A sensemaking environment for literature study
CHI '12 Extended Abstracts on Human Factors in Computing Systems
Detecting sensitive information from textual documents: an information-theoretic approach
MDAI'12 Proceedings of the 9th international conference on Modeling Decisions for Artificial Intelligence
Hi-index | 0.00 |
Many modern natural language-processing applications utilize search engines to locate large numbers of Web documents or to compute statistics over the Web corpus. Yet Web search engines are designed and optimized for simple human queries---they are not well suited to support such applications. As a result, these applications are forced to issue millions of successive queries resulting in unnecessary search engine load and in slow applications with limited scalability.In response, this paper introduces the Bindings Engine (BE), which supports queries containing typed variables and string-processing functions. For example, in response to the query "powerful ‹noun›" BE will return all the nouns in its index that immediately follow the word "powerful", sorted by frequency. In response to the query "Cities such as ProperNoun(Head(‹NounPhrase›))", BE will return a list of proper nouns likely to be city names.BE's novel neighborhood index enables it to do so with O(k) random disk seeks and O(k) serial disk reads, where k is the number of non-variable terms in its query. As a result, BE can yield several orders of magnitude speedup for large-scale language-processing applications. The main cost is a modest increase in space to store the index. We report on experiments validating these claims, and analyze how BE's space-time tradeoff scales with the size of its index and the number of variable types. Finally, we describe how a BE-based application extracts thousands of facts from the Web at interactive speeds in response to simple user queries.