Combinatorial optimization: algorithms and complexity
Combinatorial optimization: algorithms and complexity
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Template-based wrappers in the TSIMMIS system
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Wrapper generation for semi-structured Internet sources
ACM SIGMOD Record
Ontology-based extraction and structuring of information from data-rich unstructured documents
Proceedings of the seventh international conference on Information and knowledge management
Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Focused Web searching with PDAs
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Annotea: an open RDF infrastructure for shared Web annotations
Proceedings of the 10th international conference on World Wide Web
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
Authoring and annotation of web pages in CREAM
Proceedings of the 11th international conference on World Wide Web
Topic Detection and Tracking: Event-Based Information Organization
Topic Detection and Tracking: Event-Based Information Organization
A brief survey of web data extraction tools
ACM SIGMOD Record
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
WWW '03 Proceedings of the 12th international conference on World Wide Web
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Detecting web page structure for adaptive viewing on small form factor devices
WWW '03 Proceedings of the 12th international conference on World Wide Web
WWW '03 Proceedings of the 12th international conference on World Wide Web
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
HTML Page Analysis Based on Visual Cues
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Reverse Engineering for Web Data: From Visual to Semantic Structures
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Eliminating noisy information in Web pages for data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic detection of fragments in dynamically generated web pages
Proceedings of the 13th international conference on World Wide Web
Browsing fatigue in handhelds: semantic bookmarking spells relief
WWW '05 Proceedings of the 14th international conference on World Wide Web
Dialog generation for voice browsing
W4A '06 Proceedings of the 2006 international cross-disciplinary workshop on Web accessibility (W4A): Building the mobile web: rediscovering accessibility?
Model-directed web transactions under constrained modalities
Proceedings of the 15th international conference on World Wide Web
Multi-layer dialog generation for non-visual web access
ACM SIGACCESS Accessibility and Computing
Csurf: a context-driven non-visual web-browser
Proceedings of the 16th international conference on World Wide Web
Context browsing with mobiles - when less is more
Proceedings of the 5th international conference on Mobile systems, applications and services
Model-directed Web transactions under constrained modalities
ACM Transactions on the Web (TWEB)
Automated Semantic Analysis of Schematic Data
World Wide Web
Bridging the Web Accessibility Divide
Electronic Notes in Theoretical Computer Science (ENTCS)
Semantic annotation of web objects using constrained conditional random fields
WAIM'10 Proceedings of the 11th international conference on Web-age information management
From layout to semantic: a reranking model for mapping web documents to mediated XML representations
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
2D correlative-chain conditional random fields for semantic annotation of web objects
Journal of Computer Science and Technology
Hi-index | 0.00 |
Enormous amount of semantic data is still being encoded in HTML documents. Identifying and annotating the semantic concepts implicit in such documents makes them directly amenable for Semantic Web processing. In this paper we describe a highly automated technique for annotating HTML documents, especially template-based content-rich documents, containing many different semantic concepts per document. Starting with a (small) seed of hand-labeled instances of semantic concepts in a set of HTML documents we bootstrap an annotation process that automatically identifies unlabeled concept instances present in other documents. The bootstrapping technique exploits the observation that semantically related items in content-rich documents exhibit consistency in presentation style and spatial locality to learn a statistical model for accurately identifying different semantic concepts in HTML documents drawn from a variety ofWeb sources. We also present experimental results on the effectiveness of the technique.