Web outlier mining: Discovering outliers from web datasets

Authors:
Malik Agyemang;Ken Barker;Reda Alhajj
Affiliations:
Department of Computer Science, University of Calgary, 2500 University Drive N.W., Calgary, AB, Canada T2N 1N4. E-mail: agyemang,barker,alhajj@cpsc.ucalgary.ca;Department of Computer Science, University of Calgary, 2500 University Drive N.W., Calgary, AB, Canada T2N 1N4. E-mail: agyemang,barker,alhajj@cpsc.ucalgary.ca;Department of Computer Science, University of Calgary, 2500 University Drive N.W., Calgary, AB, Canada T2N 1N4. E-mail: agyemang,barker,alhajj@cpsc.ucalgary.ca and Department of Computer Science, ...
Venue:
Intelligent Data Analysis
Year:
2005

Citing 15
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
The World-Wide Web: quagmire or gold mine?

Communications of the ACM
Computing depth contours of bivariate point clouds

Computational Statistics & Data Analysis - Special issue on classification
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Web mining research: a survey

ACM SIGKDD Explorations Newsletter
Mining top-n local outliers in large databases

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the Web's Link Structure

Computer
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Data mining for hypertext: a tutorial survey

ACM SIGKDD Explorations Newsletter
Web usage mining: discovery and applications of usage patterns from Web data

ACM SIGKDD Explorations Newsletter
Implicit link analysis for small web search

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Framework for mining web content outliers

Proceedings of the 2004 ACM symposium on Applied computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Exception mining in large datasets is an important task in traditional data mining with numerous applications in credit card fraud detection, weather prediction, intrusion detection, and cellular phone cloning fraud detection; among other applications. Sifting through the dynamic, unstructured, and ever-growing web data for outliers is more challenging than finding outliers in numeric datasets. Interestingly, existing outlier mining algorithms are restricted to finding outliers in numeric datasets leaving web outlier mining as an open research issue. Web outliers are web data that show significantly different characteristics than other web data taken from the same category. Although the presence of web outliers appears obvious, algorithms for mining them are currently unavailable. Secondly, traditional outlier mining algorithms designed solely for numeric datasets cannot be used on web datasets because they typically contain multimedia. This paper establishes the presence of outliers on the web called web outliers and proposes a general framework for mining them. A web outlier taxonomy is reported that supports the development of content-specific algorithms for mining web outliers. Finally, we propose the WCO-Mine algorithm for mining web content outliers. Experimental results demonstrate that WCO-Mine is capable of finding web outliers from web datasets.