Information extraction from research papers using conditional random fields

Authors:
Fuchun Peng;Andrew McCallum
Affiliations:
BBN Technologies, Cambridge, MA;Department of Computer Science, University of Massachusetts Amherst, Amherst, MA
Venue:
Information Processing and Management: an International Journal
Year:
2006

Citing 17
Cited 36

Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Hardening soft information sources

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Automating the Construction of Internet Portals with Machine Learning

Information Retrieval
Digital Libraries and Autonomous Citation Indexing

Computer
On the Estimation of 'Small' Probabilities by Leaving-One-Out

IEEE Transactions on Pattern Analysis and Machine Intelligence
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Discriminative Reranking for Natural Language Parsing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Bibliographic attribute extraction from erroneous references based on a statistical model

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Table extraction using conditional random fields

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An integrated, conditional model of information extraction and coreference with application to citation matching

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Ranking algorithms for named-entity extraction: boosting and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A comparison of algorithms for maximum entropy parameter estimation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Technical paper recommendation: a study in combining multiple information sources

Journal of Artificial Intelligence Research
Efficiently inducing features of conditional random fields

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Bibliometric impact measures leveraging topic analysis

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Efficient inference on sequence segmentation models

ICML '06 Proceedings of the 23rd international conference on Machine learning
Creating probabilistic databases from information extraction models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Fuzzy support vector machine for multi-class text categorization

Information Processing and Management: an International Journal
Comparisons of sequence labeling algorithms and extensions

Proceedings of the 24th international conference on Machine learning
Domain adaptation of information extraction models

ACM SIGMOD Record
One-against-one fuzzy support vector machine classifier: An approach to text categorization

Expert Systems with Applications: An International Journal
Improving Legal Document Summarization Using Graphical Models

Proceedings of the 2006 conference on Legal Knowledge and Information Systems: JURIX 2006: The Nineteenth Annual Conference
Learning and inference with constraints

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Document summarization using conditional random fields

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
ONDUX: on-demand unsupervised learning for information extraction

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Unsupervised strategies for information extraction by text segmentation

Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
Open information extraction using Wikipedia

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Extracting opinion targets in a single- and cross-domain setting with conditional random fields

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Kairos: proactive harvesting of research paper metadata from scientific conference web sites

ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
SciPlore Xtract: extracting titles from scientific PDF documents by analyzing style information

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Identification of rhetorical roles for segmentation and summarization of a legal judgment

Artificial Intelligence and Law
A citation-based approach to automatic topical indexing of scientific literature

Journal of Information Science
Parsing citations in biomedical articles using conditional random fields

Computers in Biology and Medicine
Joint unsupervised structure discovery and information extraction

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The grouped author-topic model for unsupervised entity resolution

ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I
Expansion finding for given acronyms using conditional random fields

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Automatic annotation of bibliographical references in digital humanities books, articles and blogs

Proceedings of the 4th ACM workshop on Online books, complementary social media and crowdsourcing
Regularisation techniques for conditional random fields: parameterised versus parameter-free

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
A hybrid two-stage approach for discipline-independent canonical representation extraction from references

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Self-supervised learning approach for extracting citation information on the web

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Evaluation of BILBO reference parsing in digital humanities via a comparison of different tools

Proceedings of the 2012 ACM symposium on Document engineering
Minimum-risk training of approximate CRF-based NLP systems

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
WiSeNet: building a wikipedia-based semantic network with ontologized relations

Proceedings of the 21st ACM international conference on Information and knowledge management
A Two-Phase Framework for Learning Logical Structures of Paragraphs in Legal Articles

ACM Transactions on Asian Language Information Processing (TALIP)
Event argument extraction based on CRF

CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
Mining Publication Records on Personal Publication Web Pages Based on Conditional Random Fields

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Class-indexing-based term weighting for automatic text classification

Information Sciences: an International Journal
Towards a database for genotype-phenotype association research: mining data from encyclopaedia

International Journal of Data Mining and Bioinformatics
Practical extraction of disaster-relevant information from social media

Proceedings of the 22nd international conference on World Wide Web companion
Exploiting a proximity-based positional model to improve the quality of information extraction by text segmentation

ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the increasing use of research paper search engines, such as CiteSeer, for both literature search and hiring decisions, the accuracy of such systems is of paramount importance. This article employs conditional random fields (CRFs) for the task of extracting various common fields from the headers and citation of research papers. CRFs provide a principled way for incorporating various local features, external lexicon features and globle layout features. The basic theory of CRFs is becoming well-understood, but best-practices for applying them to real-world data requires additional exploration. We make an empirical exploration of several factors, including variations on Gaussian, Laplace and hyperbolic-L1 priors for improved regularization, and several classes of features. Based on CRFs, we further present a novel approach for constraint co-reference information extraction; i.e., improving extraction performance given that we know some citations refer to the same publication. On a standard benchmark dataset, we achieve new state-of-the-art performance, reducing error in average F1 by 36%, and word error rate by 78% in comparison with the previous best SVM results. Accuracy compares even more favorably against HMMs. On four co-reference IE datasets, our system significantly improves extraction performance, with an error rate reduction of 6-14%.