A unified representation of web logs for mining applications

Authors:
Michelangelo Diligenti;Marco Gori;Marco Maggini
Affiliations:
Dipartimento di Ingegneria dell'Informazione, Università degli Studi di Siena, Siena, Italy;Dipartimento di Ingegneria dell'Informazione, Università degli Studi di Siena, Siena, Italy;Dipartimento di Ingegneria dell'Informazione, Università degli Studi di Siena, Siena, Italy
Venue:
Information Retrieval
Year:
2011

Citing 40
Cited 0

Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering user queries of a search engine

Proceedings of the 10th international conference on World Wide Web
Enhanced topic distillation using text, markup tags, and hyperlinks

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Detecting similar documents using salient terms

Proceedings of the eleventh international conference on Information and knowledge management
Self-Organization and Identification of Web Communities

Computer
Combining evidence for automatic web session identification

Information Processing and Management: an International Journal - Issues of context in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Exploiting Structural Information for Text Classification on the WWW

IDA '99 Proceedings of the Third International Symposium on Advances in Intelligent Data Analysis
On Combining Link and Contents Information for Web Page Clustering

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
A taxonomy of web search

ACM SIGIR Forum
Web Usage Mining as a Tool for Personalization: A Survey

User Modeling and User-Adapted Interaction
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Dynamic web log session identification with statistical language models

Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions

IEEE Transactions on Knowledge and Data Engineering
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Query expansion using random walk models

Proceedings of the 14th ACM international conference on Information and knowledge management
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Modeling User Search Behavior

LA-WEB '05 Proceedings of the Third Latin American Web Congress
Generating query substitutions

Proceedings of the 15th international conference on World Wide Web
Mining search engine query logs for query recommendation

Proceedings of the 15th international conference on World Wide Web
Web searcher interaction with the Dogpile.com metasearch engine

Journal of the American Society for Information Science and Technology
Random walks on the click graph

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Extracting semantic relations from query logs

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
A personalized search engine based on Web-snippet hierarchical clustering

Software—Practice & Experience
Survey of Text Mining II: Clustering, Classification, and Retrieval

Survey of Text Mining II: Clustering, Classification, and Retrieval
Video suggestion and discovery for youtube: taking random walks through the view graph

Proceedings of the 17th international conference on World Wide Web
Behavioral classification on the click graph

Proceedings of the 17th international conference on World Wide Web
The query-flow graph: model and applications

Proceedings of the 17th ACM conference on Information and knowledge management
DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
Graph Twiddling in a MapReduce World

Computing in Science and Engineering
Learning to Rank for Information Retrieval

Foundations and Trends in Information Retrieval
Learning to order things

Journal of Artificial Intelligence Research
Information retrieval in folksonomies: search and ranking

ESWC'06 Proceedings of the 3rd European conference on The Semantic Web: research and applications
Query recommendation using query logs in search engines

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Web document clustering using hyperlink structures

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The collective feedback of the users of an Information Retrieval (IR) system has been shown to provide semantic information that, while hard to extract using standard IR techniques, can be useful in Web mining tasks. In the last few years, several approaches have been proposed to process the logs stored by Internet Service Providers (ISP), Intranet proxies or Web search engines. However, the solutions proposed in the literature only partially represent the information available in the Web logs. In this paper, we propose to use a richer data structure, which is able to preserve most of the information available in the Web logs. This data structure consists of three groups of entities: users, documents and queries, which are connected in a network of relations. Query refinements correspond to separate transitions between the corresponding query nodes in the graph, while users are linked to the queries they have issued and to the documents they have selected. The classical query/document transitions, which connect a query to the documents selected by the users' in the returned result page, are also considered. The resulting data structure is a complete representation of the collective search activity performed by the users of a search engine or of an Intranet. The experimental results show that this more powerful representation can be successfully used in several Web mining tasks like discovering semantically relevant query suggestions and Web page categorization by topic.