CHI '94 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
FOCUS: the interactive table for product comparison and selection
Proceedings of the 9th annual ACM symposium on User interface software and technology
Microsoft Excel 2000 Functions in Practice
Microsoft Excel 2000 Functions in Practice
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Fluid Visualization of Spreadsheet Structures
VL '98 Proceedings of the IEEE Symposium on Visual Languages
Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Spreadsheets in RDBMS for OLAP
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Nested mappings: schema mapping reloaded
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
UCheck: A spreadsheet type checker for end users
Journal of Visual Languages and Computing
Business modeling using SQL spreadsheets
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
From spreadsheets to relational databases and back
Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation
Journal of Biomedical Informatics
Clip: a Visual Language for Explicit Schema Mappings
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A Spreadsheet Algebra for a Direct Data Manipulation Query Interface
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Spreadsheet as a relational database engine
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Spreadsheet-based complex data transformation
Proceedings of the 20th ACM international conference on Information and knowledge management
Senbazuru: a prototype spreadsheet database management system
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Spreadsheets contain a huge amount of high-value data but do not observe a standard data model and thus are difficult to integrate. A large number of data integration tools exist, but they generally can only work on relational data. Existing systems for extracting relational data from spreadsheets are too labor intensive to support ad-hoc integration tasks, in which the correct extraction target is only learned during the course of user interaction. This paper introduces a system that automatically extracts relational data from spreadsheets, thereby enabling relational spreadsheet integration. The resulting integrated relational data can be queried directly or can be translated into RDF triples. When compared to standard techniques for spreadsheet data extraction on a set of 100 random Web spreadsheets, the system reduces the amount of human labor by 72% to 92%. In addition to the system design, we present the results of a general survey of more than 400,000 spreadsheets we downloaded from the Web, giving a novel view of how users organize their data in spreadsheets.