Gandalf: software development environments
IEEE Transactions on Software Engineering
The AWK programming language
Software—Practice & Experience
The design and analysis of spatial data structures
The design and analysis of spatial data structures
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
A query language for retrieving information from hierarchical text structures
The Computer Journal - Special issue on information systems
Retrieval from hierarchical texts by partial patterns
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
ACM Transactions on Computer-Human Interaction (TOCHI)
A language for queries on structure and contents of textual databases
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Programming Perl (2nd ed.)
CyberDesk: a framework for providing self-integrating context-aware services
IUI '98 Proceedings of the 3rd international conference on Intelligent user interfaces
Collaborative, programmable intelligent agents
Communications of the ACM
WebL - a programming language for the Web
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Integrating contents and structure in text retrieval
ACM SIGMOD Record
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Mind Your Grammar: a New Approach to Modelling Text
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
EMACS the extensible, customizable self-documenting display editor
Proceedings of the ACM SIGPLAN SIGOA symposium on Text manipulation
Visual AWK: a model for text processing by demonstration
VL '95 Proceedings of the 11th International IEEE Symposium on Visual Languages
ASTLOG: a language for examining abstract syntax trees
DSL'97 Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997
Outlier finding: focusing user attention on possible errors
Proceedings of the 14th annual ACM symposium on User interface software and technology
Multiple selections in smart text editing
Proceedings of the 7th international conference on Intelligent user interfaces
LAPIS: smart editing with text structure
CHI '02 Extended Abstracts on Human Factors in Computing Systems
Interactive Simultaneous Editing of Multiple Text Regions
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Toolkits for Generating Wrappers
NODe '02 Revised Papers from the International Conference NetObjectDays on Objects, Components, Architectures, Services, and Applications for a Networked World
One-pass evaluation of region algebra expressions
Information Systems
TIJAH: Embracing IR Methods in XML Databases
Information Retrieval
Thresher: automating the unwrapping of semantic content from the World Wide Web
WWW '05 Proceedings of the 14th international conference on World Wide Web
A search engine for natural language applications
WWW '05 Proceedings of the 14th international conference on World Wide Web
Extending the windows desktop interface with connected handheld computers
WSS'00 Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4
Integrating a command shell into a web browser
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Topes: reusable abstractions for validating data
Proceedings of the 30th international conference on Software engineering
Verifying the consistency of web-based technical documentations
Journal of Symbolic Computation
Explanations for regular expressions
FASE'12 Proceedings of the 15th international conference on Fundamental Approaches to Software Engineering
TEX: An efficient and effective unsupervised Web information extractor
Knowledge-Based Systems
Automatic string replace by examples
Proceedings of the 15th annual conference on Genetic and evolutionary computation
Hi-index | 0.00 |
Text is a popular storage and distribution format for information, partly due to generic text-processing tools like Unix grep and sort. Unfortunately, existing generic tools make assumptions about text format (e.g., each line is a record) that limit their applicability. Custom-built tools are one alternative, but they require substantial time investment and programming expertise. We describe a new approach, lightweight structured text processing, which overcomes these difficulties by enabling users to define text structure interactively and manipulate the structure with generic tools. Our prototype system, LAPIS, is a web browser that can highlight, filter, and sort text regions described by the user. LAPIS has several advantages over other systems: (1) the ability to define custom structure with a simple, intuitive pattern language; (2) interactive specification, showing pattern matches in context and letting users choose the most convenient combination of manual selection and pattern matching; and (3) external parsers for standard text formats. The pattern language in LAPIS, text constraints, describes text structure in high-level terms, with region relationships like before, after, in, and contains. We describe an implementation of text constraints using a novel, compact representation of region sets as collections of rectangles, or region intervals. We also illustrate some examples of applying LAPIS to web pages, text files, and source code.