Bayesian web document classification through optimizing association word

Authors:
Su Jeong Ko;Jun Hyeog Choi;Jung Hyun Lee
Affiliations:
School of Computer Science & Engineering, Inha University Yong_hyen dong, Namgu, Inchon, Korea;Division of Computer Science, Kimpo College, Kimpo, Kyonggi-do, Korea;School of Computer Science & Engineering, Inha University Yong_hyen dong, Namgu, Inchon, Korea
Venue:
IEA/AIE'2003 Proceedings of the 16th international conference on Developments in applied artificial intelligence
Year:
2003

Citing 9
Cited 1

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Improving statistical language model performance with automatically generated word hierarchies

Computational Linguistics
Automatic Indexing: An Experimental Inquiry

Journal of the ACM (JACM)
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature Subset Selection in Text-Learning

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Intelligent Spider for Internet Searching

HICSS '97 Proceedings of the 30th Hawaii International Conference on System Sciences: Information Systems Track—Internet and the Digital Economy - Volume 4
Towards the automatic identification of adjectival scales: clustering adjectives according to meaning

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics

User preference through learning user profile for ubiquitous recommendation systems

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous Bayesian document classification has a problem because it does not reflect semantic relation accurately in expressing characteristic of document. In order to resolve this problem, this paper suggests Bayesian document classification method through mining and refining of association word. Apriori algorithm extracts characteristic of test document in form of association words that reflects semantic relation and it mines association words from learning documents. If association word from learning documents is mined only with Apriori algorithm, inappropriate association word is included within them. Accordingly it has disadvantage of lack of accuracy in document classification. In order to complement the disadvantage, we adopt method to refine association words through use of genetic algorithm. Naïve Bayes classifier classifies test documents based on refined association words.