User Profiling and Re-identification: Case of University-Wide Network Analysis
TrustBus '09 Proceedings of the 6th International Conference on Trust, Privacy and Security in Digital Business
Analyzing characteristic host access patterns for re-identification of web user sessions
NordSec'10 Proceedings of the 15th Nordic conference on Information Security Technology for Applications
Hi-index | 0.00 |
This paper presents our current work on traffic log processing. Our goal is to find an approach to modeling user behaviour based on their behavioural patterns. Since the amount of input data we have is really large, effective preprocessing is crucial for the profiling to provide significant results. This paper presents our approach to restricting the input data with respect to its relevance. We use histogram clustering to identify sets of users with similar frequencies of communication; entropy and TF-IDF (Term FrequencyInverse Document Frequency) help to select destinations that are relevant for a given set of users. The main profiling is done with preprocessed data and our experiments show that this approach to restricting the input has a positive impact on the significance of results.