Keyphrase extraction for labeling a website topic hierarchy

  • Authors:
  • Nan Liu;Christopher C. Yang

  • Affiliations:
  • Chinese University of Hong Kong, Shatin, Hong Kong;Drexel University, Philadelphia, PA

  • Venue:
  • Proceedings of the 11th International Conference on Electronic Commerce
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Looking for web pages to identify useful information from a website is tedious and time consuming. Search engines are not always helpful due to the vocabulary difference between queries and web pages. Users may also have difficulty to accurately represent their information needs as queries at the beginning of exploration stage. A site map of website provides an outline of the overall structure of website. Without navigating through the website from the root page, users can easily identify the exact webpage to extract useful information to satisfy their information needs. However, site maps are not always available. In our previous work, we develop techniques to generate a website topic hierarchy. In this paper, we extend our work to extract keyphrases to label the web site topic hierarchy. The keyphrases serve in the purpose of summarizing the content so that users can efficiently browse through the site map to pin point the web page that provides the useful information they need. In the proposed keyphrase extraction, there are three major components. The first component is the candidate phrases identification. The second component computes the feature scores for summarization. The features include thematic and presentation features. The third component extracts the keyphrases by combining the feature scores. We have conducted an experiment and obtained promising result.