Finite-State Language Processing
Finite-State Language Processing
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Large-scale, parallel automatic patent annotation
Proceedings of the 1st ACM workshop on Patent information retrieval
Automatic extraction of citation information in Japanese patent applications
International Journal on Digital Libraries - Special Issue on Very Large Digital Libraries
Whetting the appetite of scientists: producing summaries tailored to the citation context
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
Scaling up high-value retrieval to medium-volume data
IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
Hi-index | 0.00 |
This paper describes experiments with Conditional Random Fields (CRF) for extracting bibliographical references in patent documents. CRF are used for performing extraction and parsing tasks which are expressed as sequence tagging problems. The automatic recognition covers references to other patent documents and to scholarship publications which are both characterized by a strong variability of contexts and patterns. Our work is not limited to the extraction of reference blocks but also includes fine-grained parsing and the resolution of the bibliographical references based on data normalization and the access to different online bibliographical services. For these different tasks, CRF models surpass significantly existing rule-based algorithms and other machine learning techniques, resulting more particularly in a very high performance for patent reference extractions with a reduction of approx. 75% of the error rate compared to previous works.