Function-based object model towards website adaptation
Proceedings of the 10th international conference on World Wide Web
Discovering informative content blocks from Web documents
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Eliminating noisy information in Web pages for data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Using link analysis to improve layout on mobile devices
Proceedings of the 13th international conference on World Wide Web
Automatic detection of fragments in dynamically generated web pages
Proceedings of the 13th international conference on World Wide Web
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Web page cleaning for web mining through feature weighting
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Hybrid method for automated news content extraction from the web
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Identifying content blocks from web documents
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Hi-index | 0.00 |
Web mining has been applied to improve web-based learning. Content-based Web mining usually focuses on main contents of web page. This paper proposes a novel approach to automatically extract main contents from web pages. Compared with existed studies, the method may determine whether a web page contains main contents, and then extracts main contents without using DOM-Tree and template. Main contributions include: (1) Introducing a new concept of Block and proposing a method to partition web page to blocks. Main contents and noise contents may be well partitioned into different blocks. (2) Introducing a concept of Web Page Block Distribution and studying its feature. Based on Block Distribution, we may effectively determine whether the web page contain main contents, and then extract main contents via outlier analysis. Experiments demonstrate utility and feasibility of the method.