Automated extraction of behavioural profiles from document usage

Authors:
S. D. Wolthusen
Affiliations:
-
Venue:
BT Technology Journal
Year:
2007

Citing 21
Cited 0

An approach to the automatic construction of global thesauri

Information Processing and Management: an International Journal
Automatic indexing based on Bayesian inference networks

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Formal Concept Analysis: Mathematical Foundations

Formal Concept Analysis: Mathematical Foundations
Inside Microsoft Windows 2000

Inside Microsoft Windows 2000
A decision theory approach to optimal automatic indexing

SIGIR '82 Proceedings of the 5th annual ACM conference on Research and development in information retrieval
Information Extraction: Towards Scalable, Adaptable Systems

Information Extraction: Towards Scalable, Adaptable Systems
An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Transparent Access To Encrypted Data Using Operating System Network Stack Extensions

Proceedings of the IFIP TC6/TC11 International Conference on Communications and Multimedia Security Issues of the New Century
A scalability analysis of classifiers in text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Security Policy Enforcement at the File System Level in the Windows NT Operating System Family

ACSAC '01 Proceedings of the 17th Annual Computer Security Applications Conference
Survey of Text Mining

Survey of Text Mining
Relationship-based clustering and cluster ensembles for high-dimensional data mining

Relationship-based clustering and cluster ensembles for high-dimensional data mining
Microsoft Windows Internals, Fourth Edition: Microsoft Windows Server(TM) 2003, Windows XP, and Windows 2000 (Pro-Developer)

Microsoft Windows Internals, Fourth Edition: Microsoft Windows Server(TM) 2003, Windows XP, and Windows 2000 (Pro-Developer)
Extended gloss overlaps as a measure of semantic relatedness

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Hierarchical Bayesian clustering for automatic text classification

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Both human analysts and particularly automated tool suites are capable of deriving sensitive information and conclusions from collections of data items that individually cannot be considered critical or sensitive. This activity of analysing and correlating material that is not immediately related is, in fact, highly desirable in many application areas and cannot be controlled precisely in advance. The decision whether a program or an analyst is performing searches and correlations beyond the scope of his authorisation or current mission can frequently be determined only ex post based on a heuristic analysis of documents accessed.In this paper we describe a mechanism for the instrumentation of operating systems to obtain information on the documents and resources accessed by arbitrary processes. Such a mechanism could be an important component of the infrastructure of an operational risk management system, generating an audit trail for compliance and forensic investigation, and acting as a sensor generating data for analysis. Addressing the latter application, the paper also outlines an approach for extracting textual information and metadata from accessed documents, regardless of the application program and workflow mechanisms used, without unduly impeding either workflows or operator performance.This information can then be subjected to an heuristic analysis based on natural language processing to extract the semantic context of each document or segment. Clustering this content and extracting the conceptual patterns that a user has accessed can then allow abnormal behaviour to be identified. This can then be refined further to determine heuristically whether the authorised remit of the user has been breached and whether an investigation is warranted. We argue that the risk of misbehaviour can be reduced while at the same time increasing productivity. This is made possible by enhancing the degree of freedom for individual users to act in the interest of their mission objectives and at the same time providing automated mechanisms for analysing user behaviour.