Managing very large document collections using semantics
Journal of Computer Science and Technology
File Classification in Self-* Storage Systems
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
YALE: rapid prototyping for complex data mining tasks
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
QueST: querying music databases by acoustic and textual features
Proceedings of the 15th international conference on Multimedia
Using provenance to aid in personal file search
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Layering in provenance systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Provenance for system troubleshooting
LISA'11 Proceedings of the 25th international conference on Large Installation System Administration
Network analysis on provenance graphs from a crowdsourcing application
IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
Provenance from log files: a BigData problem
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Static compiler analysis for workflow provenance
WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
Hi-index | 0.00 |
Rich, semantically descriptive file attributes are valuable in many contexts, such as semantic namespaces and desktop search. Descriptive attributes help users to find files placed in seemingly-arbitrary locations by different applications. However, extracting semantic attributes from file contents is nontrivial. An alternative is to examine file provenance: how and when files are used, and the agents that use them. We study the extraction of semantic attributes from file provenance by applying data mining and machine learning techniques to file metadata. We show that provenance and other metadata predict semantic attributes such as file extensions. This complements previous work, which has shown that file extensions predict access patterns.