Framework for mining web content outliers

Authors:
Malik Agyemang;Ken Barker;Reda Alhajj
Affiliations:
University of Calgary, Calgary, Alberta, Canada;University of Calgary, Calgary, Alberta, Canada;University of Calgary, Calgary, Alberta, Canada
Venue:
Proceedings of the 2004 ACM symposium on Applied computing
Year:
2004

Citing 7
Cited 10

Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Web mining research: a survey

ACM SIGKDD Explorations Newsletter
Mining top-n local outliers in large databases

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Outlier Detection in High Dimensional Spaces

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases

Mining web content outliers using structure oriented weighting techniques and N-grams

Proceedings of the 2005 ACM symposium on Applied computing
Web outlier mining: Discovering outliers from web datasets

Intelligent Data Analysis
Discovering special product features for improving the process of product selection in E-commerce environment

Proceedings of the 11th International Conference on Electronic Commerce
A comprehensive survey of numeric and symbolic outlier mining techniques

Intelligent Data Analysis
FindWDO: a k-nearest neighbors approach for detecting Web document outliers

ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
Web content outlier mining through mathematical approach and trust rating

ACACOS'11 Proceedings of the 10th WSEAS international conference on Applied computer and applied computational science
Statistical approach for improving the quality of search results

ACACOS'11 Proceedings of the 10th WSEAS international conference on Applied computer and applied computational science
Detecting outlier sections in us congressional legislation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Hybrid approach to web content outlier mining without query vector

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Mining special features to improve the performance of e-commerce product selection and resume processing

International Journal of Computational Science and Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outliers are data objects with different characteristics compared to other data objects. Exploring the diverse and dynamic web data for outliers is more interesting than finding outliers in numeric data sets. Interestingly, the existing web mining algorithms have concentrated on finding patterns that are frequent while discarding the less frequent ones that are likely to contain the outlying data. This paper refers to outliers present on the web as web outliers to distinguish them from traditional outliers. Web outliers are data objects that show significantly different characteristics than other web data. Although the presence of web outliers appears obvious, there is neither formal definition for web outliers nor algorithms for mining them. Secondly, traditional outlier mining algorithms designed solely for numeric data sets are inappropriate for mining web outliers. This paper establishes the presence of web outliers and discusses some practical applications of web outlier mining. Finally, we present taxonomy for web outliers and propose a general framework for mining web content out.