Compressed string dictionaries
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Hi-index | 0.00 |
URL (Uniform Resource Locator) normalization is an important activity in web mining. Web data can be retrieved in smoother way using effective URL normalization technique. URL normalization also reduces lot of calculations in web mining activities. A web mining technique for URL normalization is proposed in this paper. The proposed technique is based on content, structure and semantic similarity and web page redirection and forwarding similarity of the given set of URLs. Web page redirection and forward graphs can be used to measure the similarities between the URL’s and can also be used for URL clusters. The URL clusters can be used for URL normalization. A data structure is also suggested to store the forward and redirect URL information.