Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Constraint satisfaction and debugging for interactive user interfaces
Constraint satisfaction and debugging for interactive user interfaces
Finding related pages in the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Clustering hypertext with applications to web searching
HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
Measuring Structural Similarity Among Web Documents: Preliminary Results
EP '98/RIDT '98 Proceedings of the 7th International Conference on Electronic Publishing, Held Jointly with the 4th International Conference on Raster Imaging and Digital Typography: Electronic Publishing, Artistic Imaging, and Digital Typography
A New Study on Using HTML Structures to Improve Retrieval
ICTAI '99 Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence
THESUS: Organizing Web document collections based on link semantics
The VLDB Journal — The International Journal on Very Large Data Bases
A bag of paths model for measuring structural similarity in Web documents
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic document structure detection for data integration
BIS'07 Proceedings of the 10th international conference on Business information systems
Text-Based Web Page Classification with Use of Visual Information
ASONAM '10 Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining
Hi-index | 0.00 |
Measuring web page similarity is a very important task in the area of web mining and information retrieval. This paper introduces a method for measuring web page similarity, which considers both textual and visual properties of pages. Textual properties of a page are described by means of modified weight vector space model. General visual properties are captured via segmentation of a page, which divides a page into visual blocks, properties of which are stored into a vector of visual properties. These both vectors are then used to compute the overall web page similarity. This method will be described in detail and results of several experiments are also introduced in this paper.