Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Web Mining: Information and Pattern Discovery on the World Wide Web
ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
Mining distance-based outliers in near linear time with randomization and a simple pruning rule
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Framework for mining web content outliers
Proceedings of the 2004 ACM symposium on Applied computing
Text Mining: Predictive Methods for Analyzing Unstructured Information
Text Mining: Predictive Methods for Analyzing Unstructured Information
Mining web content outliers using structure oriented weighting techniques and N-grams
Proceedings of the 2005 ACM symposium on Applied computing
WCOND-Mine: Algorithm for Detecting Web Content Outliers from Web Documents
ISCC '05 Proceedings of the 10th IEEE Symposium on Computers and Communications
Hi-index | 0.00 |
Web content outliers are Web documents with varying contents compared to other Web documents taken from the same category. Mining Web content outliers can be utilized to the identification of competitors, emerging business patterns in e-commerce, and cleaning corpus used in Web documents classification. This paper proposes a k-nearest neighbors approach (FindWDO) for detecting Web document outliers. Experimental results showed that FindWDO outperforms a similar algorithm in the same domain.