KnowItNow: fast, scalable information extraction from the web

Authors:
Michael J. Cafarella;Doug Downey;Stephen Soderland;Oren Etzioni
Affiliations:
University of Washington, Seattle, WA;University of Washington, Seattle, WA;University of Washington, Seattle, WA;University of Washington, Seattle, WA
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 7
Cited 36

Modern Information Retrieval

Modern Information Retrieval
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
A search engine for natural language applications

WWW '05 Proceedings of the 14th international conference on World Wide Web
SRI International: description of the FASTUS system used for MUC-4

MUC4 '92 Proceedings of the 4th conference on Message understanding
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised named-entity extraction from the web: an experimental study

Artificial Intelligence
A probabilistic model of redundancy in information extraction

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Names and similarities on the web: fact extraction in the fast lane

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
An exploration of the principles underlying redundancy-based factoid question answering

ACM Transactions on Information Systems (TOIS)
Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds

Proceedings of the 16th international conference on World Wide Web
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
The role of documents vs. queries in extracting class attributes from text

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Collective knowledge systems: Where the Social Web meets the Semantic Web

Web Semantics: Science, Services and Agents on the World Wide Web
Transcendence: enabling a personal view of the deep web

Proceedings of the 13th international conference on Intelligent user interfaces
YAGO: A Large Ontology from Wikipedia and WordNet

Web Semantics: Science, Services and Agents on the World Wide Web
Ontology-driven, unsupervised instance population

Web Semantics: Science, Services and Agents on the World Wide Web
Using structured text for large-scale attribute extraction

Proceedings of the 17th ACM conference on Information and knowledge management
Information Extraction

Foundations and Trends in Databases
Low-Cost Supervision for Multiple-Source Attribute Extraction

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Class-driven attribute extraction

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Detecting parser errors using web-based semantic filters

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
What you seek is what you get: extraction of class attributes from query logs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
BE: a search engine for NLP research

WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus
MagicCube: choosing the best snippet for each aspect of an entity

Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting background knowledge to build reference sets for information extraction

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Entity extraction via ensemble semantics

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
A methodology to learn ontological attributes from the Web

Data & Knowledge Engineering
Analysis of a probabilistic model of redundancy in unsupervised information extraction

Artificial Intelligence
Constructing reference sets from unstructured, ungrammatical text

Journal of Artificial Intelligence Research
Materializing multi-relational databases from the web using taxonomic queries

Proceedings of the fourth ACM international conference on Web search and data mining
Learning web query patterns for imitating Wikipedia articles

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Extracting XML data from the web

Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services
Constructing efficient information extraction pipelines

Proceedings of the 20th ACM international conference on Information and knowledge management
Ontology-driven information extraction with ontosyphon

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
A semantic search conceptual model and application in security access control

ASWC'06 Proceedings of the First Asian conference on The Semantic Web
Towards distributed MCMC inference in probabilistic knowledge bases

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning

Proceedings of the sixth ACM international conference on Web search and data mining
DEBORA: dependency-based method for extracting entity-relationship triples from open-domain texts in polish

ISMIS'12 Proceedings of the 20th international conference on Foundations of Intelligent Systems
Numeric Query Answering on the Web

International Journal on Semantic Web & Information Systems
An automatic approach for ontology-based feature extraction from heterogeneous textualresources

Engineering Applications of Artificial Intelligence
Information extraction as a filtering task

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Numerous NLP applications rely on search-engine queries, both to extract information from and to compute statistics over the Web corpus. But search engines often limit the number of available queries. As a result, query-intensive NLP applications such as Information Extraction (IE) distribute their query load over several days, making IE a slow, offline process.This paper introduces a novel architecture for IE that obviates queries to commercial search engines. The architecture is embodied in a system called KnowItNow that performs high-precision IE in minutes instead of days. We compare KnowItNow experimentally with the previously-published KnowItAll system, and quantify the tradeoff between recall and speed. KnowItNow's extraction rate is two to three orders of magnitude higher than KnowItAll's.