Automatic segmentation of text into structured records
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Hidden Markov Model} Induction by Bayesian Model Merging
Advances in Neural Information Processing Systems 5, [NIPS Conference]
Nymble: a high-performance learning name-finder
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
AUTOBIB: Automatic Extraction of Bibliographic Information on the Web
IDEAS '04 Proceedings of the International Database Engineering and Applications Symposium
A second-order Hidden Markov Model for part-of-speech tagging
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Reference metadata extraction using a hierarchical knowledge representation framework
Decision Support Systems
FLUX-CIM: flexible unsupervised extraction of citation metadata
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
EURASIP Journal on Applied Signal Processing
A simple method for citation metadata extraction using hidden markov models
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Anchor text extraction for academic search
NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
A non-linear index to evaluate a journal's scientific impact
Information Sciences: an International Journal
Unsupervised strategies for information extraction by text segmentation
Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
Meta-metadata: a metadata semantics language for collection representation applications
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Metadata extraction from bibliographies using bigram HMM
ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
Minimizing the ripple effect of web-centric software by using the pheromone extension
Information Sciences: an International Journal
Hi-index | 0.08 |
Our objective was to explore an efficient and accurate extraction of metadata such as author, title and institution from heterogeneous references, using hidden Markov models (HMMs). The major contributions of the research were the (i) development of a trigram, full second order hidden Markov model with more priority to words emitted in transitions to the same state, with a corresponding new Viterbi algorithm (ii) introduction of a new smoothing technique for transition probabilities and (iii) proposal of a modification of back-off shrinkage technique for emission probabilities. The effect of the size of data set on the training procedure was also measured. Comparisons were made with other related works and the model was evaluated with three different data sets. The results showed overall accuracy, precision, recall and F1 measure of over 95% suggesting that the method outperforms other related methods in the task of metadata extraction from references.