Automatic maintenance of web directories by mining web browsing data

Authors:
Carlos Hurtado;Marcelo Mendoza
Affiliations:
Faculty of Engineering and Science, Universidad Adolfo Ibáñez, Santiago, Chile;Computer Science Department, Universidad Técnica Federico Santa María, Santiago, Chile
Venue:
Journal of Web Engineering
Year:
2011

Citing 11
Cited 0

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Modern Information Retrieval

Modern Information Retrieval
Enriching web taxonomies through subject categorization of query terms from search engine logs

Decision Support Systems - Web retrieval and mining
Clustering documents in a web directory

WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Clustering documents into a web directory for bootstrapping a supervised classification

Data & Knowledge Engineering - Special issue: WIDM 2003
Reducing human interactions in Web directory searches

ACM Transactions on Information Systems (TOIS)
Organizing domain-specific information on the Web: An experiment on the Spanish business Web directory

International Journal of Human-Computer Studies
Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)
Learning to integrate web taxonomies

Web Semantics: Science, Services and Agents on the World Wide Web
From web directories to ontologies: natural language processing challenges

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Classifying web data in directory structures

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web directories allow Web users to browse a hierarchy of categories, under which di-fferent types of resources are classified. We study the problem of maintaining a Webdirectory, that is, the problem of continually discovering and ranking resources that arerelevant to the categories of the directory. We propose an unsupervised computationalmethod that conducts the maintenance of the directory by analyses of user browsingdata. The method is based on the extraction and classification of user sessions (se-quences of resources selected by users) into the categories of the directory. In addition,we show that the directory maintenance method can be slightly modified to find queriesthat are useful to find relevant resources allowing users to switch from directory browsingto query formulation. Experimental results allow for affirmation that the proposed me-thods are effective, that they attain identification of new pages in each category and alsorecommend related queries with high precision, without needing labeled data to conducttraditional web page and query classification tasks.