Automatic segmentation of text into structured records
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Automatic document metadata extraction using support vector machines
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Mining reference tables for automatic text segmentation
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating Unstructured Data into Relational Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Information extraction from research papers using conditional random fields
Information Processing and Management: an International Journal
Reference metadata extraction using a hierarchical knowledge representation framework
Decision Support Systems
Foundations and Trends in Databases
A flexible approach for extracting metadata from bibliographic citations
Journal of the American Society for Information Science and Technology
Recovering semantics of tables on the web
Proceedings of the VLDB Endowment
Mining Publication Records on Personal Publication Web Pages Based on Conditional Random Fields
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Hi-index | 0.00 |
In this paper, we propose a framework for automatically training a model to extract citation information on the web. Constructing manually labeled training data to learn an extraction model is tedious, time consuming and difficult to be applied to several styles of citations with different types of entities. To eliminate the requirement of manually labeled training data, we exploit a knowledge base of citation domain and web search to derive labeled training data automatically. Our experiments show that the combination of knowledge base, heuristics and statistical methods can automate the extraction process and achieve good performance.