Bibliographic attribute extraction from erroneous references based on a statistical model
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
The Journal of Machine Learning Research
A simple method for citation metadata extraction using hidden markov models
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Incorporating domain knowledge into topic modeling via Dirichlet Forest priors
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Bayesian unsupervised topic segmentation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Global models of document structure using latent permutations
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Metadata extraction from bibliographies using bigram HMM
ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
Semi-supervised bibliographic element segmentation with latent permutations
ICADL'11 Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation
Hi-index | 0.00 |
This paper introduces a new approach for large-scale unsupervised segmentation of bibliographic elements. The problem is segmenting a citation given as an untagged word token sequence into subsequences so that each subsequence corresponds to a different bibliographic element e.g., authors, paper title, journal name, publication year, etc.. The same bibliographic element should be referred to by contiguous word tokens. This constraint is called contiguity constraint. The authors meet this constraint by using generalized Mallows models, effectively applied to document structure learning by Chen, Branavan, Barzilay, and Karger 2009. However, the method works for this problem only after modification. Therefore, the author proposes strategies to make the method applicable to this problem.