Defining logical domains in a web site
HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
Building a web thesaurus from web link structure
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Extracting a website's content structure from its link structure
Proceedings of the 14th ACM international conference on Information and knowledge management
User oriented link function classification
Proceedings of the 17th international conference on World Wide Web
Keyphrase extraction for labeling a website topic hierarchy
Proceedings of the 11th International Conference on Electronic Commerce
Multilingual document mining and navigation using self-organizing maps
Information Processing and Management: an International Journal
Hi-index | 0.00 |
Hierarchical models are commonly used to organize a Website's content. A Website's content structure can be represented by a topic hierarchy, a directed tree rooted at a Website's homepage in which the vertices and edges correspond to Web pages and hyperlinks. In this work, we propose a new method for constructing the topic hierarchy of a Website. We model the Website's link structure using weighted directed graph, in which the edge weights are computed using a classifier that predicts if an edge connects a pair of nodes representing a topic and a sub-topic. We then pose the problem of building the topic hierarchy as finding the shortest-path tree and directed minimum spanning tree in the weighted graph. We've done extensive experiments using real Websites and obtained very promising results.