Content independent metadata production as a machine learning problem

Authors:
Sahar Changuel;Nicolas Labroche
Affiliations:
CNRS, UMR7606, LIP6, Université Pierre et Marie Curie, Paris 6, France;CNRS, UMR7606, LIP6, Université Pierre et Marie Curie, Paris 6, France
Venue:
MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2012

Citing 18
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
The nature of statistical learning theory

The nature of statistical learning theory
Bagging predictors

Machine Learning
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Pruning and summarizing the discovered associations

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Knowledge-based metadata extraction from PostScript files

DL '00 Proceedings of the fifth ACM conference on Digital libraries
The Ariadne knowledge pool system

Communications of the ACM
Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Automatic image annotation and retrieval using cross-media relevance models

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Semantic web construction: an inquiry of authors' views on collaborative metadata generation

DCMI '02 Proceedings of the 2002 international conference on Dublin core and metadata applications: Metadata for e-communities: supporting diversity and convergence
An efficient algorithm for mining frequent closed itemsets in dynamic transaction databases

International Journal of Intelligent Systems Technologies and Applications
Functionalities for automatic metadata generation applications: a survey of metadata experts' opinions

International Journal of Metadata, Semantics and Ontologies
A General Learning Method for Automatic Title Extraction from HTML Pages

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Automatic Web Pages Author Extraction

FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Generating and evaluating automatic metadata for educational resources

ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries

Mining groups of common interest: discovering topical communities with network flows

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Metadata provide a high-level description of digital library resources and represent the key to enable the discovery and selection of suitable resources. However the growth in size and diversity of digital collections makes manual metadata extraction an expensive task. This paper proposes a new content independent method to automatically generate metadata in order to characterize resources in a given learning objects repository. The key idea is to rely on few existing metadata to learn predictive models of metadata values. The proposed method is content independent and handles resources in different formats: text, image, video, Java applet, etc. Two classical machine learning approaches are studied in this paper: in the first approach a supervised machine learning technique classify each value of a metadata field to be predicted according to the other a-priori filled metadata fields. The second approach used the FP-Growth algorithm to discover relationships between the different metadata fields as association rules. Experiments on two well-known educational data repositories show that both approaches can enhance metadata extraction and can even fill subjective metadata fields that are difficult to extract from the content of a resource, such as the difficulty of a resource.