KACTL: knowware based automated construction of a treelike library from web documents

Authors:
Ruqian Lu;Yu Huang;Kai Sun;Zhongxiang Chen;Yiwen Chen;Songmao Zhang
Affiliations:
Institute of Computing Technology, CAS Key Lab of IIP, China,Academy of Mathematics and Systems Science, CAS Key Lab of MADIS, China;Institute of Computing Technology, CAS Key Lab of IIP, China;Academy of Mathematics and Systems Science, CAS Key Lab of MADIS, China;Institute of Computing Technology, CAS Key Lab of IIP, China;Tianjin University, China;Institute of Computing Technology, CAS Key Lab of IIP, China,Academy of Mathematics and Systems Science, CAS Key Lab of MADIS, China
Venue:
WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Year:
2012

Citing 10
Cited 0

Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Grouper: a dynamic clustering interface to Web search results

WWW '99 Proceedings of the eighth international conference on World Wide Web
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
On clusterings: Good, bad and spectral

Journal of the ACM (JACM)
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
From Hardware to Software to Knowware: IT's Third Liberation?

IEEE Intelligent Systems
A divide-and-merge methodology for clustering

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
Carrot2 and language properties in web search results clustering

AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence
Carrot2: design of a flexible and efficient web information retrieval framework

AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposed a knowware based supervised machine learning technique for domain specific regression and classification of Web documents. It is simple because it is only based on word counting techniques without natural language understanding and complicated statistic techniques. Starting from constructing a domain sub-division tree and assigning a training set of documents to its nodes, the algorithm produces a labeled classification tree with a characteristic vector for each node. This tree is used to classify any number of documents in that particular domain. A tool for developing Web portal is also provided to build a Web station for displaying the final treelike library of documents.