Names and similarities on the web: fact extraction in the fast lane

Authors:
Marius Paşca;Dekang Lin;Jeffrey Bigham;Andrei Lifchits;Alpa Jain
Affiliations:
Google Inc., Mountain View, CA;Google Inc., Mountain View, CA;University of Washington, Seattle, WA;University of British Columbia, Vancouver, BC;Columbia University, New York, NY
Venue:
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Year:
2006

Citing 13
Cited 20

Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Building a question answering test collection

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Offline strategies for online question answering: answering questions before they are asked

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A bootstrapping method for learning semantic lexicons using extraction pattern contexts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Discovering relations among named entities from large corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A semantic approach to IE pattern induction

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
KnowItNow: fast, scalable information extraction from the web

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

Extracting Semantic Networks from Text Via Relational Clustering

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
A quality-aware optimizer for information extraction

ACM Transactions on Database Systems (TODS)
Translation and extension of concepts across languages

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Coupling semi-supervised learning of categories and relations

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
An analysis of bootstrapping for the recognition of temporal expressions

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Identifying comparable entities on the web

Proceedings of the 18th ACM conference on Information and knowledge management
Concordance-based entity-oriented search

Web Intelligence and Agent Systems
Reducing semantic drift with bagging and distributional similarity

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Enhancement of lexical concepts using cross-lingual web mining

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Coupled semi-supervised learning for information extraction

Proceedings of the third ACM international conference on Web search and data mining
I4E: interactive investigation of iterative information extraction

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research
Assessing the challenge of fine-grained named entity recognition and classification

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Unsupervised discovery of negative categories in lexicon bootstrapping

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Automated translation of semantic relationships

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
User Behaviors in Related Word Retrieval and New Word Detection: A Collaborative Perspective

ACM Transactions on Asian Language Information Processing (TALIP)
Relation acquisition using word classes and partial patterns

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Heuristic algorithm for extraction of facts using relational model and syntactic data

MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Large-Scale learning of relation-extraction rules with distant supervision from the web

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a new approach to large-scale extraction of facts from unstructured text, distributional similarities become an integral part of both the iterative acquisition of high-coverage contextual extraction patterns, and the validation and ranking of candidate facts. The evaluation measures the quality and coverage of facts extracted from one hundred million Web documents, starting from ten seed facts and using no additional knowledge, lexicons or complex tools.