Extraction and organization of encyclopedic knowledge information using the World Wide Web

  • Authors:
  • Atsushi Fujii;Tetsuya Ishikawa

  • Affiliations:
  • Graduate School of Library, Information and Media Studies, University of Tsukuba, Tsukuba City, 305-8550 Japan and JST (Japan Science and Technology Agency), Kawaguchi City, 332-0012 Japan;Graduate School of Library, Information and Media Studies, University of Tsukuba, Tsukuba City, 305-8550 Japan

  • Venue:
  • Systems and Computers in Japan
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although encyclopedias and dictionaries are valuable sources of knowledge about language, they often do not define neologisms or technical terms. In this research the authors focus on the fact that new and technical information frequently flows through the World Wide Web and propose a system to automatically generate encyclopedic knowledge using Web pages. This system extracts term descriptions from Web pages based on sentence representations and HTML layout. In addition, it provides organization by classifying the term descriptions based on domain and meaning, thus improving information quality. The results of evaluation experiments using technical terms found in the Information Technology Engineers Examination showed that encyclopedic information generated by this system had greater coverage and practical-level quality than existing dictionaries. © 2005 Wiley Periodicals, Inc. Syst Comp Jpn, 36(14): 81–90, 2005; Published online in Wiley InterScience (). DOI 10.1002/scj.10296