Web page analysis based on HTML DOM and its usage for forum statistics and alerts
ECC'10 Proceedings of the 4th conference on European computing conference
WSEAS Transactions on Computers
Hi-index | 0.00 |
To extract information automatically from semi-structured web pages, this paper puts forward a method named IESS for discovering the record model based on DOM and Maximal Similar Sub Tree, to identify records automatically and correctly when there are some differences in expression models of records that belong to the same type. To test the performance of the method, a scientific literature statistical analysis system is designed. The practice shows that users can quickly understand the distribution of papers in their retrieving field and grasp the importance with the help of the system.