FindWDO: a k-nearest neighbors approach for detecting Web document outliers

Authors:
Ayman Tanira;Ahmed Rafea;Hesham Hassan
Affiliations:
Palestine Technical College, Deir El-Ballah, Gaza Strip, Palestine;American University-Egypt, Cairo, Egypt;Cairo University, Giza, Egypt
Venue:
ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
Year:
2008

Citing 9
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Web Mining: Information and Pattern Discovery on the World Wide Web

ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Framework for mining web content outliers

Proceedings of the 2004 ACM symposium on Applied computing
Text Mining: Predictive Methods for Analyzing Unstructured Information

Text Mining: Predictive Methods for Analyzing Unstructured Information
Mining web content outliers using structure oriented weighting techniques and N-grams

Proceedings of the 2005 ACM symposium on Applied computing
WCOND-Mine: Algorithm for Detecting Web Content Outliers from Web Documents

ISCC '05 Proceedings of the 10th IEEE Symposium on Computers and Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web content outliers are Web documents with varying contents compared to other Web documents taken from the same category. Mining Web content outliers can be utilized to the identification of competitors, emerging business patterns in e-commerce, and cleaning corpus used in Web documents classification. This paper proposes a k-nearest neighbors approach (FindWDO) for detecting Web document outliers. Experimental results showed that FindWDO outperforms a similar algorithm in the same domain.