Combining lexical and formatting cues for named entity acquisition from the web

Authors:
Christian Jacquemin;Caroline Bush
Affiliations:
CNRS-LIMSI, ORSAY Cedex, France;CNRS-LIMSI, ORSAY Cedex, France
Venue:
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Year:
2000

Citing 5
Cited 2

Corpus processing for lexical acquisition

Corpus processing for lexical acquisition
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
TétraFusion: Information Discovery on the Internet

IEEE Intelligent Systems
Wrapper induction for information extraction

Wrapper induction for information extraction
An intelligent multilingual information browsing and retrieval system using information extraction

ANLC '97 Proceedings of the fifth conference on Applied natural language processing

Mining free text for structure

Data mining
Webpage understanding: an integrated approach

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because of their constant renewal, it is necessary to acquire fresh named entities (NEs) from recent text sources. We present a tool for the acquisition and the typing of NEs from the Web that associates a harvester and three parallel shallow parsers dedicated to specific structures (lists, enumerations, and anchors). The parsers combine lexical indices such as discourse markers with formatting instructions (HTML tags) for analyzing enumerations and associated initializers.