The nature of statistical learning theory
The nature of statistical learning theory
Machine Learning
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
The Java Tutorial: A Short Course on the Basics
The Java Tutorial: A Short Course on the Basics
Computer and Robot Vision
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Why Table Ground-Truthing is Hard
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Mining tables from large scale HTML texts
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A framework for web table mining
Proceedings of the 4th international workshop on Web information and data management
Detecting web page structure for adaptive viewing on small form factor devices
WWW '03 Proceedings of the 12th international conference on World Wide Web
Using the structure of Web sites for automatic segmentation of tables
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Tree-Structured Template Generation for Web Pages
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A Scalable Hybrid Approach for Extracting Head Components from Web Tables
IEEE Transactions on Knowledge and Data Engineering
Learning table extraction from examples
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Transforming arbitrary tables into logical form with TARTAR
Data & Knowledge Engineering
Vertical Navigation of Layout Adapted Web Documents
World Wide Web
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
Automatic searching of tables in digital libraries
Proceedings of the 16th international conference on World Wide Web
TableSeer: automatic table metadata extraction and searching in digital libraries
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
OntoMiner: automated metadata and instance mining from news websites
International Journal of Web and Grid Services
WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
Identifying table boundaries in digital documents via sparse line detection
Proceedings of the 17th ACM conference on Information and knowledge management
Discriminating Meaningful Web Tables from Decorative Tables Using a Composite Kernel
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Foundations and Trends in Databases
Automatic hidden-web table interpretation, conceptualization, and semantic annotation
Data & Knowledge Engineering
Table extraction using spatial reasoning on the CSS2 visual box model
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Enabling Interactive Access to Web Tables
Proceedings of the 13th International Conference on Human-Computer Interaction. Part I: New Trends
Using some web content mining techniques for Arabic text classification
DNCOCO'09 Proceedings of the 8th WSEAS international conference on Data networks, communications, computers
Web Semantics: Science, Services and Agents on the World Wide Web
Web-scale knowledge extraction from semi-structured tables
Proceedings of the 19th international conference on World wide web
Automatic hidden-web table interpretation by sibling page comparison
ER'07 Proceedings of the 26th international conference on Conceptual modeling
PROSPECT: a system for screening candidates for recruitment
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A fine-grained taxonomy of tables on the web
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Enhancing browsing experience of table and image elements in web pages
International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Communications of the ACM
Web-scale table census and classification
Proceedings of the fourth ACM international conference on Web search and data mining
Mining for attributes and values in tables
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
FACTO: a fact lookup engine based on web tables
Proceedings of the 20th international conference on World wide web
An approach to assess the quality of web pages in the deep web
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
An efficient pre-processing method to identify logical components from PDF documents
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Enabling efficient browsing and manipulation of web tables on smartphone
HCII'11 Proceedings of the 14th international conference on Human-computer interaction: towards mobile and intelligent interaction environments - Volume Part III
Bipartite Graph Based Entity Ranking for Related Entity Finding
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Hybrid approach to extracting information from web-tables
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Image description mining and hierarchical clustering on data records using HR-Tree
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
A machine learning based approach for separating head from body in web-tables
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Concept-Based search on semi-structured data exploiting mined semantic relations
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Improving web browsing on small devices based on table classification
PCM'04 Proceedings of the 5th Pacific Rim Conference on Advances in Multimedia Information Processing - Volume Part II
Structure detection system from web documents through backpropagation network learning
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Web table discrimination with composition of rich structural and content information
Applied Soft Computing
Understanding tables on the web
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Adapting data table to improve web accessibility
Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility
Schema extraction for tabular data on the web
Proceedings of the VLDB Endowment
Web table taxonomy and formalization
ACM SIGMOD Record
Using linked data to mine RDF from wikipedia's tables
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.02 |
Table is a commonly used presentation scheme, especially for describing relational information. However, table understanding remains an open problem. In this paper, we consider the problem of table detection in web documents. Its potential applications include web mining, knowledge management, and web content summarization and delivery to narrow-bandwidth devices. We describe a machine learning based approach to classify each given table entity as either genuine or non-genuine. Various features reflecting the layout as well as content characteristics of tables are studied.In order to facilitate the training and evaluation of our table classifier, we designed a novel web document table ground truthing protocol and used it to build a large table ground truth database. The database consists of 1,393 HTML files collected from hundreds of different web sites and contains 11,477 leaf TABLE elements, out of which 1,740 are genuine tables. Experiments were conducted using the cross validation method and an F-measure of 95.89% was achieved.