Unsupervised Segmentation of Bibliographic Elements with Latent Permutations

Authors:
Tomonari Masada
Affiliations:
Nagasaki University, Japan
Venue:
International Journal of Organizational and Collective Intelligence
Year:
2011

Citing 7
Cited 1

Bibliographic attribute extraction from erroneous references based on a statistical model

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Latent dirichlet allocation

The Journal of Machine Learning Research
A simple method for citation metadata extraction using hidden markov models

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Incorporating domain knowledge into topic modeling via Dirichlet Forest priors

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Bayesian unsupervised topic segmentation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Global models of document structure using latent permutations

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Metadata extraction from bibliographies using bigram HMM

ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization

Semi-supervised bibliographic element segmentation with latent permutations

ICADL'11 Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a new approach for large-scale unsupervised segmentation of bibliographic elements. The problem is segmenting a citation given as an untagged word token sequence into subsequences so that each subsequence corresponds to a different bibliographic element e.g., authors, paper title, journal name, publication year, etc.. The same bibliographic element should be referred to by contiguous word tokens. This constraint is called contiguity constraint. The authors meet this constraint by using generalized Mallows models, effectively applied to document structure learning by Chen, Branavan, Barzilay, and Karger 2009. However, the method works for this problem only after modification. Therefore, the author proposes strategies to make the method applicable to this problem.