Intelligently creating and recommending reusable reformatting rules

Authors:
Christopher Scaffidi;Brad Myers;Mary Shaw
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Proceedings of the 14th international conference on Intelligent user interfaces
Year:
2009

Citing 10
Cited 5

Editing by example

ACM Transactions on Programming Languages and Systems (TOPLAS)
SWYN: a visual representation for regular expressions

Your wish is my command
Outlier finding: focusing user attention on possible errors

Proceedings of the 14th annual ACM symposium on User interface software and technology
Training Agents to Recognize Text by Example

Autonomous Agents and Multi-Agent Systems
A Risk and Control Oriented Study of the Practices of Spreadsheet Application Developers

HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 2: Decision Support and Knowledge-Based Systems
RE-tree: an efficient index structure for regular expressions

The VLDB Journal — The International Journal on Very Large Data Bases
Header and Unit Inference for Spreadsheets Through Spatial Analyses

VLHCC '04 Proceedings of the 2004 IEEE Symposium on Visual Languages - Human Centric Computing
Topes: reusable abstractions for validating data

Proceedings of the 30th international conference on Software engineering
Potluck: Data mash-up tool for casual users

Web Semantics: Science, Services and Agents on the World Wide Web
Fast, Accurate Creation of Data Validation Formats by End-User Developers

IS-EUD '09 Proceedings of the 2nd International Symposium on End-User Development

Sharing, finding and reusing end-user code for reformatting and validating data

Journal of Visual Languages and Computing
Wrangler: interactive visual specification of data transformation scripts

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Data collection by the people, for the people

CHI '11 Extended Abstracts on Human Factors in Computing Systems
Profiler: integrated statistical analysis and visualization for data quality assessment

Proceedings of the International Working Conference on Advanced Visual Interfaces
Research directions in data wrangling: visuatizations and transformations for usable and credible data

Information Visualization - Special issue on State of the Field and New Research Directions

Quantified Score

Hi-index	0.00

Visualization

Abstract

When users combine data from multiple sources into a spreadsheet or dataset, the result is often a mishmash of different formats, since phone numbers, dates, course numbers and other string-like kinds of data can each be written in many different formats. Although spreadsheets provide features for reformatting numbers and a few specific kinds of string data, they do not provide any support for the wide range of other kinds of string data encountered by users. We describe a user interface where a user can describe the formats of each kind of data. We provide an algorithm that uses these formats to automatically generate reformatting rules that transform strings from one format to another. In effect, our system enables users to create a small expert system called a "tope" that can recognize and reformat instances of one kind of data. Later, as the user is working with a spreadsheet, our system recommends appropriate topes for validating and reformatting the data. With a recall of over 80% for a query time of under 1 second, this algorithm is accurate enough and fast enough to make useful recommendations in an interactive setting. A laboratory experiment shows that compared to manual typing, users can reformat sample spreadsheet data more than twice as fast by creating and using topes.