Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge

Authors:
Marius Pasca;Dekang Lin;Jeffrey Bigham;Andrei Lifchits;Alpa Jain
Affiliations:
Google Inc., Mountain View, CA;Google Inc., Mountain View, CA;Google Inc., Univ. of Washington, Seattle, WA;Google Inc., Univ. of British Columbia, Vancouver, BC;Google Inc., Columbia Univ., New York, NY
Venue:
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Year:
2006

Citing 12
Cited 54

Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Building a question answering test collection

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Offline strategies for online question answering: answering questions before they are asked

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A bootstrapping method for learning semantic lexicons using extraction pattern contexts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Discovering relations among named entities from large corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
KnowItNow: fast, scalable information extraction from the web

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds

Proceedings of the 16th international conference on World Wide Web
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Ontology-driven, unsupervised instance population

Web Semantics: Science, Services and Agents on the World Wide Web
Measuring the similarity between implicit semantic relations using web search engines

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Measuring the similarity between implicit semantic relations from the web

Proceedings of the 18th international conference on World wide web
Structural, transitive and latent models for biographic fact extraction

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Relation extraction from wikipedia using subtree mining

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Mining translations of web queries from web click-through data

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Japanese query alteration based on semantic similarity

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Coupling semi-supervised learning of categories and relations

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
What you seek is what you get: extraction of class attributes from query logs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Exploiting background knowledge to build reference sets for information extraction

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Entity extraction via ensemble semantics

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Relational duality: unsupervised extraction of semantic relations between entities on the web

Proceedings of the 19th international conference on World wide web
A scalable machine-learning approach for semi-structured named entity recognition

Proceedings of the 19th international conference on World wide web
Pattern-based semantic tagging for ontology population

SOCASE'08 Proceedings of the 2008 AAMAS international conference on Service-oriented computing: agents, semantics, and engineering
Cost-effective web search in bootstrapping for named entity recognition

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
I4E: interactive investigation of iterative information extraction

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Analysis of a probabilistic model of redundancy in unsupervised information extraction

Artificial Intelligence
Assessing the challenge of fine-grained named entity recognition and classification

NEWS '10 Proceedings of the 2010 Named Entities Workshop
FactRank: random walks on a web of facts

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Corpus-based semantic class mining: distributional vs. pattern-based approaches

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Self-supervised mining of human activity from CGM

PKAW'10 Proceedings of the 11th international conference on Knowledge management and acquisition for smart systems and services
Portable extraction of partially structured facts from the web

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Constructing reference sets from unstructured, ungrammatical text

Journal of Artificial Intelligence Research
Human activity mining using conditional radom fields and self-supervised learning

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part I
Materializing multi-relational databases from the web using taxonomic queries

Proceedings of the fourth ACM international conference on Web search and data mining
Capturing users' buying activity at Akihabara electric town from twitter

ICCCI'10 Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part II
Extraction and geographical navigation of important historical events in the web

W2GIS'11 Proceedings of the 10th international conference on Web and wireless geographical information systems
Weighted Vote-Based Classifier Ensemble for Named Entity Recognition: A Genetic Algorithm-Based Approach

ACM Transactions on Asian Language Information Processing (TALIP)
Classifier Ensemble Selection Using Genetic Algorithm for Named Entity Recognition

Research on Language and Computation
Taxonomy induction based on a collaboratively built knowledge repository

Artificial Intelligence
Evaluating significance of historical entities based on tempo-spatial impacts analysis using Wikipedia link structure

Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Automatically building training examples for entity extraction

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Gauging the internet doctor: ranking medical claims based on community knowledge

Proceedings of the 2011 workshop on Data mining for medicine and healthcare
Introduction to linked data and its lifecycle on the web

RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
SCMS: semantifying content management systems

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part II
Building a generic debugger for information extraction pipelines

Proceedings of the 20th ACM international conference on Information and knowledge management
ChronoSeeker: search engine for future and past events

Proceedings of the 4th International Conference on Uniquitous Information Management and Communication
Sequence clustering and labeling for unsupervised query intent discovery

Proceedings of the fifth ACM international conference on Web search and data mining
Self-supervised capturing of users' activities from weblogs

International Journal of Intelligent Information and Database Systems
Relation adaptation: learning to extract novel relations with minimum supervision

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Pattern learning for relation extraction with a hierarchical topic model

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
A new minimally-supervised framework for domain word sense disambiguation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Context similarity measure using Fuzzy Formal Concept Analysis

Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Artificial Intelligence
Various approaches to text representation for named entity disambiguation

Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition

Data & Knowledge Engineering
Context Aware Named Entity Disambiguation

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Methods for exploring and mining tables on Wikipedia

Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
A semi-supervised approach to extract pharmacogenomics-specific drug-gene pairs from biomedical literature for personalized medicine

Journal of Biomedical Informatics
Ontology-aware partitioning for knowledge graph identification

Proceedings of the 2013 workshop on Automated knowledge base construction
Using natural language to integrate, evaluate, and optimize extracted knowledge bases

Proceedings of the 2013 workshop on Automated knowledge base construction
Introduction to linked data and its lifecycle on the web

RW'13 Proceedings of the 9th international conference on Reasoning Web: semantic technologies for intelligent data access

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the inherent difficulty of processing noisy text, the potential of the Web as a decentralized repository of human knowledge remains largely untapped during Web search. The access to billions of binary relations among named entities would enable new search paradigms and alternative methods for presenting the search results. A first concrete step towards building large searchable repositories of factual knowledge is to derive such knowledge automatically at large scale from textual documents. Generalized contextual extraction patterns allow for fast iterative progression towards extracting one million facts of a given type (e.g., Person-BornIn-Year) from 100 million Web documents of arbitrary quality. The extraction starts from as few as 10 seed facts, requires no additional input knowledge or annotated text, and emphasizes scale and coverage by avoiding the use of syntactic parsers, named entity recognizers, gazetteers, and similar text processing tools and resources.