Table extraction using conditional random fields

Authors:
David Pinto;Andrew McCallum;Xing Wei;W. Bruce Croft
Affiliations:
University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA
Venue:
dg.o '03 Proceedings of the 2003 annual national conference on Digital government research
Year:
2003

Citing 4
Cited 5

A tutorial on hidden Markov models and selected applications in speech recognition

Readings in speech recognition
QuASM: a system for question answering using semi-structured data

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning

Detecting and recognizing tables in spreadsheets

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Context identification of sentences in related work sections using a conditional random field: towards intelligent digital libraries

Proceedings of the 10th annual joint conference on Digital libraries
Ontology-based modelling of related work sections in research articles: using CRFs for developing semantic data based information retrieval systems

Proceedings of the 6th International Conference on Semantic Systems
Non-visual navigation of spreadsheet tables

ICCHP'10 Proceedings of the 12th international conference on Computers helping people with special needs: Part I
Understanding tables on the web

ER'12 Proceedings of the 31st international conference on Conceptual Modeling

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability to find tables and extract information from them is a necessary component of data mining, question answering and other information retrieval tasks. Documents often contain tables in order to communicate densely packed, multi-dimensional information. Tables do this by employing layout patterns to efficiently indicate fields and records in two-dimensional form. Their rich combination of formatting and content present difficulties for traditional language modeling techniques, however. The poster presents the use of conditional random fields (CRFs) for table extraction, and compares them with hidden Markov models (HMMs). Unlike HMMs, CRFs support the use of many rich and overlapping layout and language features, and as a result, they perform significantly better.