Web mining from competitors' websites

Authors:
Xin Chen;Yi-fang Brook Wu
Affiliations:
New Jersey Institute of Technology, Newark, NJ;New Jersey Institute of Technology, Newark, NJ
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 6
Cited 3

Machine learning: applications in expert systems and information retrieval

Machine learning: applications in expert systems and information retrieval
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Deriving concept hierarchies from text

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Discovering unexpected information from your competitors' web sites

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluating the novelty of text-mined rules using lexical knowledge

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Untangling text data mining

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Extracting Advantage Phrases That Hint at a New Technology's Potentials

PAKM '08 Proceedings of the 7th International Conference on Practical Aspects of Knowledge Management
Social tie mining in company networks

Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
Mining competitive relationships by learning across heterogeneous networks

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a framework for user-oriented text mining. It is then illustrated with an example of discovering knowledge from competitors' websites. The knowledge to be discovered is in the form of association rules. A user's background knowledge is represented as a concept hierarchy developed from documents on his/her own website. The concept hierarchy captures the semantic usage of words and relationships among words in background documents. Association rules are identified among the noun phrases extracted from documents on competitors' websites. The interestingness measure, i.e. novelty, which measures the semantic distance between the antecedent and the consequent of a rule in the background knowledge, is computed from the co-occurrence frequency of words and the connection lengths among words in the concept hierarchy. A user evaluation of the novelty of discovered rules demonstrates that the correlation between the algorithm and the human judges is comparable to that between human judges.