Web Snippet Clustering Based on Text Enrichment with Concept Hierarchy

  • Authors:
  • Supakpong Jinarat;Choochart Haruechaiyasak;Arnon Rungsawang

  • Affiliations:
  • Massive Information & Knowledge Engineering, Department of Computer Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand 10900;Human Language Technology Laboratory (HLT), National Electronics and Computer Technology Center (NECTEC), Pathumthani, Thailand 12120;Massive Information & Knowledge Engineering, Department of Computer Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand 10900

  • Venue:
  • ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering web snippet results returned from search engine helps facilitate browsing and navigating for users. Due to the extremely short length of web snippets, many traditional clustering techniques which adopt the bag of words model often yields unsatisfactory clustering results. In this paper, we propose a method of text enrichment for improving performance of web snippet clustering. The main idea is to expand the original snippets with some related conceptual terms. We apply the Open Directory Project (ODP), a web taxonomy organized by humans, to provide the concept hierarchy of the web contents. Using a test data set of 240 queries, we performed the experiments by using two clustering techniques: K-means clustering as the non-overlapping approach and the Suffix Tree Clustering (STC) as the overlapping approach. Using the proposed text enrichment method, the K-means clustering yielded the overall performance improvement up to 15.51% based on the F1 measure. On the other hand, the Suffix Tree Clustering with text enrichment helped improve the performance up to 53.71%.