Towards terascale knowledge acquisition

Authors:
Patrick Pantel;Deepak Ravichandran;Eduard Hovy
Affiliations:
University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 16
Cited 40

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Discovering word senses from text

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
PRINCIPAR: an efficient, broad-coverage, principle-based parser

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Finding parts in very large corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Automatic construction of a hypernym-labeled noun hierarchy from text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Mitigating the paucity-of-data problem: exploring the effect of training corpus size on classifier performance for natural language processing

HLT '01 Proceedings of the first international conference on Human language technology research
Scaling context space

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning semantic constraints for the automatic discovery of part-whole relations

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Offline strategies for online question answering: answering questions before they are asked

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Fine-grained proper noun ontologies for question answering

SEMANET '02 Proceedings of the 2002 workshop on Building and using semantic networks - Volume 11

Espresso: leveraging generic patterns for automatically harvesting semantic relations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Efficient unsupervised discovery of word categories using symmetric patterns and high frequency words

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Integrating pattern-based and distributional similarity methods for lexical entailment acquisition

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Ontology-driven, unsupervised instance population

Web Semantics: Science, Services and Agents on the World Wide Web
Extracting hypernym pairs from the web

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
On2L: a framework for incremental ontology learning in spoken dialog systems

COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Information Extraction and Semantic Annotation of Wikipedia

Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
Automatically Harvesting and Ontologizing Semantic Relations

Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
Superior and efficient fully unsupervised pattern-based concept acquisition using an unsupervised parser

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Lexical patterns or dependency patterns: which is better for hypernym extraction?

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Translation and extension of concepts across languages

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Bootstrapping distributional feature vector quality

Computational Linguistics
Unsupervised concept discovery in Hebrew using simple unsupervised word prefix segmentation for Hebrew and Arabic

Semitic '09 Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
Individual and domain adaptation in sentence planning for dialogue

Journal of Artificial Intelligence Research
Corpus-based semantic lexicon induction with Web-based corroboration

UMSLLS '09 Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics
Robust ontology acquisition from machine-readable dictionaries

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
A metric-based framework for automatic taxonomy induction

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Geo-mining: discovery of road and transport networks using directional patterns

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Enhancement of lexical concepts using cross-lingual web mining

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Hypernym discovery based on distributional similarity and hierarchical structures

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Web-scale distributional similarity and entity set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Using hypernymy acquisition to tackle (part of) textual entailment

TextInfer '09 Proceedings of the 2009 Workshop on Applied Textual Inference
Automated translation of semantic relationships

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
FactRank: random walks on a web of facts

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Corpus-based semantic class mining: distributional vs. pattern-based approaches

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Can Esculape cure the complex of œdipe in the medical domain?

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Synthesizing products for online catalogs

Proceedings of the VLDB Endowment
Jigs and lures: associating web queries with structured entities

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A text mining approach for definition question answering

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Ontology acquisition for automatic building of scientific portals

SOFSEM'06 Proceedings of the 32nd conference on Current Trends in Theory and Practice of Computer Science
Ontology-driven information extraction with ontosyphon

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Disentangling from babylonian confusion – unsupervised language identification

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Corpus-Based acquisition of support verb constructions for portuguese

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Mining entity types from query logs via user intent modeling

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Finding answers in the Œdipe system by extracting and applying linguistic patterns

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Large-Scale learning of relation-extraction rules with distant supervision from the web

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Minimally-supervised extraction of domain-specific part-whole relations using Wikipedia as knowledge-base

Data & Knowledge Engineering
Extracting query facets from search results

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Information extraction as a filtering task

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Editorial: Minimally-supervised learning of domain-specific causal relations using an open-domain corpus as knowledge base

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although vast amounts of textual data are freely available, many NLP algorithms exploit only a minute percentage of it. In this paper, we study the challenges of working at the terascale. We present an algorithm, designed for the teraxale, for mining is-a relations that achieves similar performance to a state-of-the-art linguistically-rich method. We focus on the accuracy of these two systems as a function of processing time and corpus size.