Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hardening soft information sources
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
On the Estimation of 'Small' Probabilities by Leaving-One-Out
IEEE Transactions on Pattern Analysis and Machine Intelligence
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Discriminative Reranking for Natural Language Parsing
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Automatic document metadata extraction using support vector machines
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Bibliographic attribute extraction from erroneous references based on a statistical model
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Ranking algorithms for named-entity extraction: boosting and the voted perceptron
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A comparison of algorithms for maximum entropy parameter estimation
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Technical paper recommendation: a study in combining multiple information sources
Journal of Artificial Intelligence Research
Efficiently inducing features of conditional random fields
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Bibliometric impact measures leveraging topic analysis
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Efficient inference on sequence segmentation models
ICML '06 Proceedings of the 23rd international conference on Machine learning
Creating probabilistic databases from information extraction models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Fuzzy support vector machine for multi-class text categorization
Information Processing and Management: an International Journal
Comparisons of sequence labeling algorithms and extensions
Proceedings of the 24th international conference on Machine learning
Domain adaptation of information extraction models
ACM SIGMOD Record
One-against-one fuzzy support vector machine classifier: An approach to text categorization
Expert Systems with Applications: An International Journal
Improving Legal Document Summarization Using Graphical Models
Proceedings of the 2006 conference on Legal Knowledge and Information Systems: JURIX 2006: The Nineteenth Annual Conference
Learning and inference with constraints
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Document summarization using conditional random fields
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
ONDUX: on-demand unsupervised learning for information extraction
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Unsupervised strategies for information extraction by text segmentation
Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
Open information extraction using Wikipedia
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Extracting opinion targets in a single- and cross-domain setting with conditional random fields
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Kairos: proactive harvesting of research paper metadata from scientific conference web sites
ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
SciPlore Xtract: extracting titles from scientific PDF documents by analyzing style information
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Identification of rhetorical roles for segmentation and summarization of a legal judgment
Artificial Intelligence and Law
A citation-based approach to automatic topical indexing of scientific literature
Journal of Information Science
Parsing citations in biomedical articles using conditional random fields
Computers in Biology and Medicine
Joint unsupervised structure discovery and information extraction
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The grouped author-topic model for unsupervised entity resolution
ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I
Expansion finding for given acronyms using conditional random fields
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Automatic annotation of bibliographical references in digital humanities books, articles and blogs
Proceedings of the 4th ACM workshop on Online books, complementary social media and crowdsourcing
Regularisation techniques for conditional random fields: parameterised versus parameter-free
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Self-supervised learning approach for extracting citation information on the web
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Evaluation of BILBO reference parsing in digital humanities via a comparison of different tools
Proceedings of the 2012 ACM symposium on Document engineering
Minimum-risk training of approximate CRF-based NLP systems
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
WiSeNet: building a wikipedia-based semantic network with ontologized relations
Proceedings of the 21st ACM international conference on Information and knowledge management
A Two-Phase Framework for Learning Logical Structures of Paragraphs in Legal Articles
ACM Transactions on Asian Language Information Processing (TALIP)
Event argument extraction based on CRF
CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
Mining Publication Records on Personal Publication Web Pages Based on Conditional Random Fields
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Class-indexing-based term weighting for automatic text classification
Information Sciences: an International Journal
Towards a database for genotype-phenotype association research: mining data from encyclopaedia
International Journal of Data Mining and Bioinformatics
Practical extraction of disaster-relevant information from social media
Proceedings of the 22nd international conference on World Wide Web companion
ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137
Hi-index | 0.00 |
With the increasing use of research paper search engines, such as CiteSeer, for both literature search and hiring decisions, the accuracy of such systems is of paramount importance. This article employs conditional random fields (CRFs) for the task of extracting various common fields from the headers and citation of research papers. CRFs provide a principled way for incorporating various local features, external lexicon features and globle layout features. The basic theory of CRFs is becoming well-understood, but best-practices for applying them to real-world data requires additional exploration. We make an empirical exploration of several factors, including variations on Gaussian, Laplace and hyperbolic-L1 priors for improved regularization, and several classes of features. Based on CRFs, we further present a novel approach for constraint co-reference information extraction; i.e., improving extraction performance given that we know some citations refer to the same publication. On a standard benchmark dataset, we achieve new state-of-the-art performance, reducing error in average F1 by 36%, and word error rate by 78% in comparison with the previous best SVM results. Accuracy compares even more favorably against HMMs. On four co-reference IE datasets, our system significantly improves extraction performance, with an error rate reduction of 6-14%.