A brief survey of web data extraction tools
ACM SIGMOD Record
Named Entity recognition without gazetteers
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
University of Durham: description of the LOLITA system as used in MUC-6
MUC6 '95 Proceedings of the 6th conference on Message understanding
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Postal Address Detection fromWeb Documents
WIRI '05 Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration
Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A Meta-model Approach to the Management of Hypertexts in Web Information Systems
ER '08 Proceedings of the ER 2008 Workshops (CMLSA, ECDM, FP-UML, M2AS, RIGiM, SeCoGIS, WISM) on Advances in Conceptual Modeling: Challenges and Opportunities
Expert Systems with Applications: An International Journal
A Structured Approach to Data Reverse Engineering of Web Applications
ICWE '9 Proceedings of the 9th International Conference on Web Engineering
Design challenges and misconceptions in named entity recognition
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Hi-index | 0.00 |
The Semantic Web is gaining increasing interest to fulfill the need of sharing, retrieving, and reusing information. Since Web pages are designed to be read by people, not machines, searching and reusing information on the Web is a difficult task without human participation. To this aim adding semantics (i.e meaning) to a Web page would help the machines to understand Web contents and better support the Web search process. One of the latest developments in this field is Google's Rich Snippets, a service for Web site owners to add semantics to their Web pages. In this paper we provide a structured approach to automatically annotate a Web page with Rich Snippets RDFa tags. Exploiting a data reverse engineering method, combined with several heuristics, and a named entity recognition technique, our method is capable of recognizing and annotating a subset of Rich Snippets' vocabulary, i.e., all the attributes of its Review concept, and the names of the Person and Organization concepts. We implemented tools and services and evaluated the accuracy of the approach on real E-commerce Web sites.