Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
The World-Wide Web: quagmire or gold mine?
Communications of the ACM
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Discovering unexpected information from your competitors' web sites
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining top-n local outliers in large databases
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the Web's Link Structure
Computer
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Data mining for hypertext: a tutorial survey
ACM SIGKDD Explorations Newsletter
Framework for mining web content outliers
Proceedings of the 2004 ACM symposium on Applied computing
Mining web content outliers using structure oriented weighting techniques and N-grams
Proceedings of the 2005 ACM symposium on Applied computing
A comprehensive survey of numeric and symbolic outlier mining techniques
Intelligent Data Analysis
Web content outlier mining through mathematical approach and trust rating
ACACOS'11 Proceedings of the 10th WSEAS international conference on Applied computer and applied computational science
Detecting outlier sections in us congressional legislation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Hi-index | 0.00 |
Mining outliers from large datasets is like finding needles in a haystack. Even more challenging is sifting through the dynamic, unstructured, and ever-growing web data for outliers. This paper presents HyCOQ, which is a hybrid algorithm that draws from the power of n-gram-based and word-based systems. Experimental results obtained using embedded motifs without a dictionary show significant improvement over using a domain dictionary irrespective of the type of data used (words, n-grams, or hybrid). Also, there is remarkable improvement in recall with hybrid documents compared to using raw words and n-grams without a domain dictionary.