A robust linguistic platform for efficient and domain specific web content analysis

Authors:
Thierry Hamon;Adeline Nazarenko;Thierry Poibeau;Sophie Aubin;Julien Derivière
Affiliations:
LIPN -- UMR CNRS, Villetaneuse, France;LIPN -- UMR CNRS, Villetaneuse, France;LIPN -- UMR CNRS, Villetaneuse, France;LIPN -- UMR CNRS, Villetaneuse, France;LIPN -- UMR CNRS, Villetaneuse, France
Venue:
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Year:
2007

Citing 11
Cited 2

An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Sentence Filtering for Information Extraction in Genomics, a Classification Problem

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Extracting Biochemical Interactions from MEDLINE Using a Link Grammar Parser

ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
Structural ambiguity and lexical relations

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
The Talent system: TEXTRACT architecture and data model

Natural Language Engineering
UIMA: an architectural approach to unstructured information processing in the corporate research environment

Natural Language Engineering
Evolving GATE to meet new challenges in language engineering

Natural Language Engineering
KIM – a semantic platform for information extraction and retrieval

Natural Language Engineering
A comparison of parsing technologies for the biomedical domain

Natural Language Engineering
Analysis of link grammar on biomedical dependency corpus targeted at protein-protein interactions

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Event-based information extraction for the biomedical domain: the Caderige project

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications

How Can the Term Compositionality Be Useful for Acquiring Elementary Semantic Relations?

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Acquisition of elementary synonym relations from biological structured terminology

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web semantic access in specific domains calls for specialized search engines with enhanced semantic querying and indexing capacities, which pertain both to information retrieval (IR) and to information extraction (IE). A rich linguistic analysis is required either to identify the relevant semantic units to index and weight them according to linguistic specific statistical distribution, or as the basis of an information extraction process. Recent developments make Natural Language Processing (NLP) techniques reliable enough to process large collections of documents and to enrich them with semantic annotations. This paper focuses on the design and the development of a text processing platform, Ogmios, which has been developed in the ALVIS project. The Ogmios platform exploits existing NLP modules and resources, which may be tuned to specific domains and produces linguistically annotated documents. We show how the three constraints of genericity, domain semantic awareness and performance can be handled all together.