Web content outlier mining through mathematical approach and trust rating

Authors:
G. Poonkuzhali;K. Sarukesi;G. V. Uma
Affiliations:
Department of Computer Science and Engineering, Rajalakshmi Engineering College, Anna University, Chennai, Tamil Nadu, India;Hindustan Institute of Technology and Science, Chennai, Tamil Nadu, India;Department of Information Science & Technology, Anna University, Chennai, Tamil Nadu, India
Venue:
ACACOS'11 Proceedings of the 10th WSEAS international conference on Applied computer and applied computational science
Year:
2011

Citing 6
Cited 0

An Approach to Identify Duplicated Web Pages

COMPSAC '02 Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment
Framework for mining web content outliers

Proceedings of the 2004 ACM symposium on Applied computing
Mining web content outliers using structure oriented weighting techniques and N-grams

Proceedings of the 2005 ACM symposium on Applied computing
WCOND-Mine: Algorithm for Detecting Web Content Outliers from Web Documents

ISCC '05 Proceedings of the 10th IEEE Symposium on Computers and Communications
The Research of Web Page De-duplication Based on Web Pages Reshipment Statement

DBTA '09 Proceedings of the 2009 First International Workshop on Database Technology and Applications
Hybrid approach to web content outlier mining without query vector

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this Internet era, the WWW is flooded with voluminous amount of information with more replicated and irrelevant web pages. As the unnecessary and duplicated web pages increase the indexing space and time complexity, finding and removing these pages become a significant issue among the information retrieval and web mining research communities as most of the people rely on search engines to get the required information. Web content outlier mining plays a decisive role in covering all these aspects. Existing algorithms for web content outlier mining focuses attention on applying weightage only to structured documents whereas in this research work, a mathematical approach based on two way rectangular representations, signed approach of trust rating and correlation method is developed for retrieving right information without duplicates present in both structured and unstructured web documents.