C4.5: programs for machine learning
C4.5: programs for machine learning
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Defining logical domains in a web site
HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
Modern Information Retrieval
Web Style Guide: Basic Design Principles for Creating Web Sites
Web Style Guide: Basic Design Principles for Creating Web Sites
Web site mining: a new way to spot competitors, customers and suppliers in the world wide web
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Web Logs for Personalized Site Maps
WISEW '02 Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops) - (WISEw'02)
Arborescence optimization problems solvable by Edmonds' algorithm
Theoretical Computer Science
Data mining for hypertext: a tutorial survey
ACM SIGKDD Explorations Newsletter
Building a web thesaurus from web link structure
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The connectivity sonar: detecting site functionality by structural patterns
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Two-Phase Web Site Classification Based on Hidden Markov Tree Models
WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
The Best Trail Algorithm for Assisted Navigation of Web Sites
LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Web unit mining: finding and classifying subgraphs of web pages
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Mining web site's topic hierarchy
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Extracting a website's content structure from its link structure
Proceedings of the 14th ACM international conference on Information and knowledge management
Hierarchical topic segmentation of websites
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Coarse-grained classification of web sites by their structural properties
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Proceedings of the 16th international conference on World Wide Web
Hierarchical summarization of large documents
Journal of the American Society for Information Science and Technology
Keyphrase extraction for labeling a website topic hierarchy
Proceedings of the 11th International Conference on Electronic Commerce
Multilingual document mining and navigation using self-organizing maps
Information Processing and Management: an International Journal
MenuMiner: revealing the information architecture of large web sites by analyzing maximal cliques
Proceedings of the 21st international conference companion on World Wide Web
Mining taxonomies from web menus: rule-based concepts and algorithms
ICWE'13 Proceedings of the 13th international conference on Web Engineering
Hi-index | 0.00 |
Navigating through hyperlinks within a Web site to look for information from one of its Web pages without the support of a site map can be inefficient and ineffective. Although the content of a Web site is usually organized with an inherent structure like a topic hierarchy, which is a directed tree rooted at a Web site's homepage whose vertices and edges correspond to Web pages and hyperlinks, such a topic hierarchy is not always available to the user. In this work, we studied the problem of automatic generation of Web sites' topic hierarchies. We modeled a Web site's link structure as a weighted directed graph and proposed methods for estimating edge weights based on eight types of features and three learning algorithms, namely decision trees, naïve Bayes classifiers, and logistic regression. Three graph algorithms, namely breadth-first search, shortest-path search, and directed minimum-spanning tree, were adapted to generate the topic hierarchy based on the graph model. We have tested the model and algorithms on real Web sites. It is found that the directed minimum-spanning tree algorithm with the decision tree as the weight learning algorithm achieves the highest performance with an average accuracy of 91.9%. © 2009 Wiley Periodicals, Inc.