Gathering and mining information from web log files

Authors:
Maristella Agosti;Giorgio Maria Di Nunzio
Affiliations:
Department of Information Engineering, University of Padua, Italy;Department of Information Engineering, University of Padua, Italy
Venue:
DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development
Year:
2007

Citing 2
Cited 3

The entity-relationship model—toward a unified view of data

ACM Transactions on Database Systems (TODS) - Special issue: papers from the international conference on very large data bases: September 22–24, 1975, Framingham, MA
Analysing HTTP logs of a European DL initiative to maximize usage and usability

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers

LogCLEF 2009: the CLEF 2009 multilingual logfile analysis track overview

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Web log analysis: a review of a decade of studies about information acquisition, inspection and interpretation of user interaction

Data Mining and Knowledge Discovery
Personalizing search using socially enhanced interest model, built from the stream of user's activity

Journal of Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a general methodology for gathering and mining information from Web log files is proposed. A series of tools to retrieve, store, and analyze the data extracted from log files have been designed and implemented. The aim is to form general methods by abstracting from the analysis of logs which use a well-defined standard format, such as the Extended Log File Format proposed by W3C. The methodology has been experimented on the Web log files of The European Library portal; the experimental analyses led to personal, technical, geographical and temporal findings about the usage and traffic load. Considerations about a more accurate tracking of users and users profiles, and a better management of crawler accesses using authentication are presented.