Automatic segmentation of text into structured records
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Statistical Language Learning
A Machine Learning Approach to Building Domain-Specific Search Engines
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Citation Recognition for Scientific Publications in Digital Libraries
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Nymble: a high-performance learning name-finder
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
AUTOBIB: Automatic Extraction of Bibliographic Information on the Web
IDEAS '04 Proceedings of the International Database Engineering and Applications Symposium
Tagging of name records for genealogical data browsing
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Reference metadata extraction using a hierarchical knowledge representation framework
Decision Support Systems
FLUX-CIM: flexible unsupervised extraction of citation metadata
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Bibliographic Meta-Data Extraction Using Probabilistic Finite State Transducers
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Metadata extraction from bibliographies using bigram HMM
ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
Extracting semantic annotations from legal texts
Proceedings of the 20th ACM conference on Hypertext and hypermedia
FireCite: lightweight real-time reference string extraction from webpages
NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
Meta-metadata: a metadata semantics language for collection representation applications
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A trigram hidden Markov model for metadata extraction from heterogeneous references
Information Sciences: an International Journal
Unsupervised segmentation of bibliographic elements with latent permutations
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Semi-supervised bibliographic element segmentation with latent permutations
ICADL'11 Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation
Structure-preserving pipelines for digital libraries
LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Web-based citation parsing, correction and augmentation
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Improved bibliographic reference parsing based on repeated patterns
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Unsupervised Segmentation of Bibliographic Elements with Latent Permutations
International Journal of Organizational and Collective Intelligence
Hi-index | 0.00 |
This paper describes a simple method for extracting metadata fields from citations using hidden Markov models. The method is easy to implement and can achieve levels of precision and recall for heterogeneous citations comparable to or greater than other HMM-based methods. The method consists largely of string manipulation and otherwise depends only on an implementation of the Viterbi algorithm, which is widely available, and so can be implemented by diverse digital library systems.