Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
The World-Wide Web: quagmire or gold mine?
Communications of the ACM
Computing depth contours of bivariate point clouds
Computational Statistics & Data Analysis - Special issue on classification
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques
Data mining: concepts and techniques
ACM SIGKDD Explorations Newsletter
Mining top-n local outliers in large databases
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the Web's Link Structure
Computer
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Data mining for hypertext: a tutorial survey
ACM SIGKDD Explorations Newsletter
Web usage mining: discovery and applications of usage patterns from Web data
ACM SIGKDD Explorations Newsletter
Implicit link analysis for small web search
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Framework for mining web content outliers
Proceedings of the 2004 ACM symposium on Applied computing
Hi-index | 0.00 |
Exception mining in large datasets is an important task in traditional data mining with numerous applications in credit card fraud detection, weather prediction, intrusion detection, and cellular phone cloning fraud detection; among other applications. Sifting through the dynamic, unstructured, and ever-growing web data for outliers is more challenging than finding outliers in numeric datasets. Interestingly, existing outlier mining algorithms are restricted to finding outliers in numeric datasets leaving web outlier mining as an open research issue. Web outliers are web data that show significantly different characteristics than other web data taken from the same category. Although the presence of web outliers appears obvious, algorithms for mining them are currently unavailable. Secondly, traditional outlier mining algorithms designed solely for numeric datasets cannot be used on web datasets because they typically contain multimedia. This paper establishes the presence of outliers on the web called web outliers and proposes a general framework for mining them. A web outlier taxonomy is reported that supports the development of content-specific algorithms for mining web outliers. Finally, we propose the WCO-Mine algorithm for mining web content outliers. Experimental results demonstrate that WCO-Mine is capable of finding web outliers from web datasets.