Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Automatic segmentation of text into structured records
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Hidden Markov Model} Induction by Bayesian Model Merging
Advances in Neural Information Processing Systems 5, [NIPS Conference]
Information Extraction with HMM Structures Learned by Stochastic Optimization
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Nymble: a high-performance learning name-finder
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A system for supporting evidence recording in bibliographic records: Research Articles
Journal of the American Society for Information Science and Technology
A simple method for citation metadata extraction using hidden markov models
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Building a scalable web query system
DNIS'07 Proceedings of the 5th international conference on Databases in networked information systems
A trigram hidden Markov model for metadata extraction from heterogeneous references
Information Sciences: an International Journal
Unsupervised segmentation of bibliographic elements with latent permutations
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Semi-supervised bibliographic element segmentation with latent permutations
ICADL'11 Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation
ICWL'07 Proceedings of the 6th international conference on Advances in web based learning
Unsupervised Segmentation of Bibliographic Elements with Latent Permutations
International Journal of Organizational and Collective Intelligence
Hi-index | 0.00 |
In recent years, we have seen huge volumes of research papers available on the World Wide Web. Metadata provides a good approach for organizing and retrieving these useful resources. Accordingly, automatic extraction of metadata from these papers and their bibliographies is meaningful and has been widely studied. In this paper, we utilize a bigram HMM (Hidden Markov Model) for automatic extraction of metadata (i.e. title, author, date, journal, pages, etc.) from bibliographies with various styles. Different from the traditional HMM, which only uses word frequency, this model also considers both words' bigram sequential relation and position information in text fields. We have evaluated the model on a real corpus downloaded from Web and compared it with other methods. Experiments show that the bigram HMM yields the best result and seem to be the most promising candidate for metadata extraction of bibliographies.