ACM Transactions on Programming Languages and Systems (TOPLAS)
SWYN: a visual representation for regular expressions
Your wish is my command
Outlier finding: focusing user attention on possible errors
Proceedings of the 14th annual ACM symposium on User interface software and technology
Training Agents to Recognize Text by Example
Autonomous Agents and Multi-Agent Systems
A Risk and Control Oriented Study of the Practices of Spreadsheet Application Developers
HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 2: Decision Support and Knowledge-Based Systems
RE-tree: an efficient index structure for regular expressions
The VLDB Journal — The International Journal on Very Large Data Bases
Header and Unit Inference for Spreadsheets Through Spatial Analyses
VLHCC '04 Proceedings of the 2004 IEEE Symposium on Visual Languages - Human Centric Computing
Topes: reusable abstractions for validating data
Proceedings of the 30th international conference on Software engineering
Potluck: Data mash-up tool for casual users
Web Semantics: Science, Services and Agents on the World Wide Web
Fast, Accurate Creation of Data Validation Formats by End-User Developers
IS-EUD '09 Proceedings of the 2nd International Symposium on End-User Development
Sharing, finding and reusing end-user code for reformatting and validating data
Journal of Visual Languages and Computing
Wrangler: interactive visual specification of data transformation scripts
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Data collection by the people, for the people
CHI '11 Extended Abstracts on Human Factors in Computing Systems
Profiler: integrated statistical analysis and visualization for data quality assessment
Proceedings of the International Working Conference on Advanced Visual Interfaces
Information Visualization - Special issue on State of the Field and New Research Directions
Hi-index | 0.00 |
When users combine data from multiple sources into a spreadsheet or dataset, the result is often a mishmash of different formats, since phone numbers, dates, course numbers and other string-like kinds of data can each be written in many different formats. Although spreadsheets provide features for reformatting numbers and a few specific kinds of string data, they do not provide any support for the wide range of other kinds of string data encountered by users. We describe a user interface where a user can describe the formats of each kind of data. We provide an algorithm that uses these formats to automatically generate reformatting rules that transform strings from one format to another. In effect, our system enables users to create a small expert system called a "tope" that can recognize and reformat instances of one kind of data. Later, as the user is working with a spreadsheet, our system recommends appropriate topes for validating and reformatting the data. With a recall of over 80% for a query time of under 1 second, this algorithm is accurate enough and fast enough to make useful recommendations in an interactive setting. A laboratory experiment shows that compared to manual typing, users can reformat sample spreadsheet data more than twice as fast by creating and using topes.