Applying semantic links for classifying web pages

Authors:
Ben Choi;Qing Guo
Affiliations:
Computer Science, College of Engineering and Science, Louisiana Tech University, Ruston, LA;Computer Science, College of Engineering and Science, Louisiana Tech University, Ruston, LA
Venue:
IEA/AIE'2003 Proceedings of the 16th international conference on Developments in applied artificial intelligence
Year:
2003

Citing 8
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Deriving the learning bias from rule properties

Machine intelligence 12
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing

Communications of the ACM
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic hypertext classification is an essential technique for organizing vast amount of Internet Web pages or HTML documents. One the of problems in classifying Web pages is that Web pages are usually short and contain insufficient text to clearly identify its category. Text classification mechanisms, by analyzing only the contents of the document itself, are relatively ineffective in classifying short Web pages. This paper proposes a new hypertext classification mechanism to address the problem by analyzing not only the Web page itself but also its linked Web pages referred by the URLs contained within the page. The URLs are treated as semantic links. The hypothesis is that the linked Web pages contain related information to help identifying the category of the Web page. Experimental results show that the proposed approach could increase the accuracy by 35% over the approach of analyzing only the Web page itself.