Populating a domain ontology from web historical dictionaries and encyclopedias

  • Authors:
  • Eduardo Motta;Alexandre Andreatta;Sean Siqueira

  • Affiliations:
  • Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro -- RJ -- Brazil;Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro -- RJ -- Brazil;Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro -- RJ -- Brazil

  • Venue:
  • Proceedings of the 2008 Euro American Conference on Telematics and Information Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

An increasing volume of information is available on the web and usually is expressed as text, representing unstructured or semi-structured data. Thus, semantic information is implicit in these texts, since they are mainly intended for human consumption and interpretation. Therefore, it is not easy to automatically identify concepts or establish relations among them inside the texts. In particular, some web sites contain information on historical data about artistic manifestations like literature or music. This kind of site contains a body of knowledge on the domain, and usually is constructed with some format and content patterns that may be useful for information extraction. In order to make this information available as a structured knowledge base, an information extraction process is necessary. Ontologies are an appropriate way to represent structured knowledge bases, enabling sharing, reuse and inference. In this paper, it is described an information extraction process cycle for populating a domain ontology using texts available on the internet to extract instances of concepts, events and relations, based on existing ontology development methodologies and information extraction techniques. Through this process, latent concepts and relations expressed in natural language can be extracted and represented as an ontology, allowing new uses of the available content. A case study that applies this process is presented.