Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Hi-index | 0.01 |
It is an investigative purpose to acquire information on the event information page that exists in the municipality website in the form of a possible machine process. In this paper, we propose an extraction method from a HTML document based on dictionary.HTML tag is deleted from the HTML document and it converts it into the text. And, it proposes the method for extracting a target character string by comparing the text with the collection of words prepared beforehand. The evaluation experiment was done to the municipality in 23 Tokyo district and 56 Chiba prefecture in Japan. The proposal method was able to extract event information on as a whole 73%. The LR-Wrapper was 52%. The Tree-Wrapper was 55%. The PLR-Wrapper was 32%. The proposal method confirmed event information was rating higher than an existing method extractive by the combination of a simple algorithm and the collection of words.