Mining Web Pages for Data Records

Authors:
Bing Liu;Robert Grossman;Yanhong Zhai
Affiliations:
University of Illinois at Chicago;University of Illinois at Chicago;University of Illinois at Chicago
Venue:
IEEE Intelligent Systems
Year:
2004

Citing 8
Cited 6

Algorithms for string searching

ACM SIGIR Forum
A hierarchical approach to wrapper induction

Proceedings of the third annual conference on Autonomous Agents
Record-boundary discovery in Web documents

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Generating finite-state transducers for semi-structured data extraction from the Web

Information Systems - Special issue on semistructured data
IEPAD: information extraction based on pattern discovery

Proceedings of the 10th international conference on World Wide Web
A flexible learning system for wrapping tables and lists in HTML documents

Proceedings of the 11th international conference on World Wide Web
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
A Fully Automated Object Extraction System for the World Wide Web

ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems

Mining association rules from imprecise ordinal data

Fuzzy Sets and Systems
RENS --- Enabling a Robot to Identify a Person

ICIRA '09 Proceedings of the 2nd International Conference on Intelligent Robotics and Applications
Extraction of user-defined data blocks using the regularity of dynamic web pages

ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
Automated extraction of hit numbers from search result pages

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
The HiLeX system for semantic information extraction

Transactions on Large-Scale Data- and Knowledge-Centered Systems V
TEX: An efficient and effective unsupervised Web information extractor

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Much information on the Web is contained in regularly structured objects, or data records. Data records often present their host pages' essential information, such as lists of products and services. Mining data records to extract this information can help you provide value-added services. Existing approaches to data extraction on the Web include supervised learning and automatic techniques. Supervised learning requires substantial human effort, and current automatic techniques provide poor results. To solve this problem, the MDR (mining data records) system exploits two key observations about the layout of data records in Web pages and employs a string-matching algorithm. Experiments show that this new automatic technique significantly outperforms existing methods. In addition, it mines both contiguous and noncontiguous data records.