GAIML: A new language for verbal and graphical interaction in chatbots
Mobile Information Systems - Information Assurance and Advanced Human-Computer Interfaces
RENS --- Enabling a Robot to Identify a Person
ICIRA '09 Proceedings of the 2nd International Conference on Intelligent Robotics and Applications
Tag tree template for Web information and schema extraction
Expert Systems with Applications: An International Journal
Bricolage: example-based retargeting for web design
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Intelligent self-repairable web wrappers
AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
RTED: a robust algorithm for the tree edit distance
Proceedings of the VLDB Endowment
Robust web data extraction: a novel approach based on minimum cost script edit model
WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Hi-index | 0.00 |
The main issue for effective Web information extraction is how to recognize similar patterns in a Web page. Tra- ditionally, it has been shown that pattern matching by us- ing the HTML DOM tree is more efficient than the sim- ple string matching approach. Nonetheless, previous tree- based pattern matching methods have problems by assum- ing that all HTML tags have the same values, assigning the same weight to each node in HTML trees. This paper proposes an enhanced tree matching algo- rithm that improves the tree edit distance method by con- sidering the characteristics of HTML features. We assign different values to different HTML tree nodes according to their weights for displaying the corresponding data objects in the browser. Pattern matching of HTML patterns is done by obtaining the maximum mapping values of two HTML trees that are constructed with weighted node values from HTML data objects. Experiments are done over several Web commerce sites to evaluate the effectiveness of the proposed HTML tree matching algorithm.