A Web Information Extraction System to DB Prototyping

Authors:
P. Moreda;Rafael Muñoz;Patricio Martínez-Barco;Cristina Cachero;Manuel Palomar
Affiliations:
-;-;-;-;-
Venue:
NLDB '02 Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers
Year:
2002

Citing 8
Cited 1

Internal and external evidence in the identification and semantic categorization of proper names

Corpus processing for lexical acquisition
Combining Supervised-Unsupervised Methods for Word Sense Disambiguation

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Named Entity recognition without gazetteers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Automatic semantic tagging of unknown proper names

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Overview of results of the MUC-6 evaluation

MUC6 '95 Proceedings of the 6th conference on Message understanding
University of Durham: description of the LOLITA system as used in MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding
SRA: description of the SRA system as used for MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding
Experiments in word domain disambiguation for parallel texts

WWSM '00 Proceedings of the ACL-2000 workshop on Word senses and multi-linguality - Volume 8

Improving the development of data warehouses by enriching dimension hierarchies with WordNet

ODBIS'05/06 Proceedings of the First and Second VLDB conference on Ontologies-based databases and information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Database prototyping is a technique widely used both to validate user requirements and to verify certain application functionality. These tasks usually require the population of the underlying data structures with sampling data that, additionally, may need to stick to certain restrictions. Although some existing approaches have already automated this population task by means of random data generation, the lack of semantic meaning of the resulting structures may interfere both in the user validation and in the designer verification task.In order to solve this problem and improve the intuitiveness of the resulting prototypes, this paper presents a population system that, departing from the information contained in a UML-compliant Domain Conceptual Model, applies Information Extraction techniques to compile meaningful information sets from texts available through Internet. The system is based on the semantic information extracted from the EWN lexical resource and includes, among other features, a named entity recognition system and an ontology that speed up the prototyping process and improve the quality of the sampling data.