Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV
Advances in kernel methods
Making large-scale support vector machine learning practical
Advances in kernel methods
Improved classification via connectivity information
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Does “authority” mean quality? predicting expert quality ratings of Web documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Effective site finding using link anchor information
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
The Importance of Prior Probabilities for Entry Page Search
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Web classification using support vector machine
Proceedings of the 4th international workshop on Web information and data management
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Composite Kernels for Hypertext Categorisation
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Web site mining: a new way to spot competitors, customers and suppliers in the world wide web
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Matching web site structure and content
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Hierarchical topic segmentation of websites
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Event detection from evolution of click-through data
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Web site topic-hierarchy generation based on link structure
Journal of the American Society for Information Science and Technology
Framework for building a high-quality web page collection considering page group structure
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Blog classification using tags: an empirical study
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Detecting hot events from web search logs
WAIM'10 Proceedings of the 11th international conference on Web-age information management
iWed: an integrated multigraph cut-based approach for detecting events from a website
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
On discovering concept entities from web sites
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
Improving semantic consistency of web sites by quantifying user intent
ICWE'05 Proceedings of the 5th international conference on Web Engineering
ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities
A method for creating a high quality collection of researchers' homepages from the web
ICADL'05 Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences
Web classification of conceptual entities using co-training
Expert Systems with Applications: An International Journal
Detecting and Tracking Topics and Events from Web Search Logs
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
In web classification, most researchers assume that the objects to classify are individual web pages from one or more web sites. In practice, the assumption is too restrictive since a web page itself may not always correspond to a concept instance of some semantic concept (or category) given to the classification task. In this paper, we want to relax this assumption and allow a concept instance to be represented by a subgraph of web pages or a set of web pages. We identify several new issues to be addressed when the assumption is removed, and formulate the web unit mining problem. We also propose an iterative web unit mining (iWUM) method that first finds subgraphs of web pages using some knowledge about web site structure. From these web subgraphs, web units are constructed and classified into semantic concepts (or categories) in an iterative manner. Our experiments using the WebKB dataset showed that iWUM improves the overall classification performance and works very well on the more structured parts of a web site.