Turning the web into a database: extracting data and structure

Authors:
Eduard H. Hovy
Affiliations:
Information Sciences Institute, University of Southern California
Venue:
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Year:
2009

Citing 20
Cited 1

Noun-phrase co-occurrence statistics for semiautomatic semantic lexicon construction

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Toward general-purpose learning for information extraction

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
The automated acquisition of topic signatures for text summarization

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Acquisition of categorized named entities for web search

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Finding parts in very large corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Automatic construction of a hypernym-labeled noun hierarchy from text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A graph model for unsupervised lexical acquisition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Fine grained classification of named entities

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Learning surface text patterns for a Question Answering system

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning semantic constraints for the automatic discovery of part-whole relations

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Exploiting strong syntactic heuristics and co-training to learn semantic lexicons

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A bootstrapping method for learning semantic lexicons using extraction pattern contexts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Fine-grained proper noun ontologies for question answering

SEMANET '02 Proceedings of the 2002 workshop on Building and using semantic networks - Volume 11
Automated multi-document summarization in NeATS

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Deriving a large scale taxonomy from Wikipedia

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
Toward completeness in concept extraction and classification

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

Database researchers: plumbers or thinkers?

Proceedings of the 14th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

People build databases to collect, systematize, and make available to users knowledge in a consistent and hopefully trustworthy form. But the largest data collection today, the web, is not systematic, consistent, or trustworthy, and the access techniques we use are provably inadequate. Focusing just on text, what would it take to extract information from the web, organize it, and form a database (both instances and metadata) from it? This paper discusses some of the core problems and provides examples of recent research in NLP: automated instance mining, metadata structure harvesting, and inter-concept relation discovery.