Natural Language Analysis for Semantic Document Modeling

Authors:
Terje Brasethvik;Jon Atle Gulla
Affiliations:
-;-
Venue:
NLDB '00 Proceedings of the 5th International Conference on Applications of Natural Language to Information Systems-Revised Papers
Year:
2000

Citing 8
Cited 0

Knowledge organization and access in a conceptual information system

Information Processing and Management: an International Journal - Artificial Intelligence and Information Retrieval
Toward principles for the design of ontologies used for knowledge sharing

International Journal of Human-Computer Studies - Special issue: the role of formal ontology in the information technology
Sorting out searching: a user-interface framework for text searches

Communications of the ACM
A hierarchical approach to the automatic categorization of medical documents

Proceedings of the seventh international conference on Information and knowledge management
Natural Language Information Retrieval

Natural Language Information Retrieval
Service Trading Using Conceptual Structures

ICCS '95 Proceedings of the Third International Conference on Conceptual Structures: Applications, Implementation and Theory
ICE: an object oriented toolkit for tailoring collaborative

Proceedings of the IFIP TC8/WG8.1 Working Conference on Information Systems in the WWW Environment
Transformation of Requirement Specifications Expressed in Natural Language into an EER Model

ER '93 Proceedings of the 12th International Conference on the Entity-Relationship Approach: Entity-Relationship Approach

Quantified Score

Hi-index	0.00

Visualization

Abstract

To ease the retrieval of documents published on the Web, the documents should be classified in a way that users find helpful and meaningful. This paper presents an approach to semantic document classification and retrieval based on Natural Language Analysis and Conceptual Modeling. A conceptual domain model is used in combination with linguistic tools to define a controlled vocabulary for a document collection. Users may browse this domain model and interactively classify documents by selecting model fragments that describe the contents of the documents. Natural language tools are used to analyze the text of the documents and propose relevant domain model concepts and relations. The proposed fragments are refined by the users and stored as XML document descriptions. For document retrieval, lexical analysis is used to pre-process search expressions and map these to the domain model for manual query-refinement. A prototype of the system is described, and the approach is illustrated with examples from a document collection published by the Norwegian Center for Medical Informatics (KITH).