Word association norms, mutual information, and lexicography
Computational Linguistics
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Do HTML Tags Flag Semantic Content?
IEEE Internet Computing
Helping Courseware Authors to Build Ontologies: The Case of TM4L
Proceedings of the 2007 conference on Artificial Intelligence in Education: Building Technology Rich Learning Contexts That Work
Named entity disambiguation based on explicit semantics
SOFSEM'12 Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science
Hi-index | 0.00 |
We focus on mining relevant information from web pages. Unlike plain text documents, web pages contain another source of potentially relevant information - easily processable mark-up. We propose an approach to keyword extraction that enhances Automatic Term Recognition (ATR) algorithms intended for processing plain text documents with an analysis of HTML tags present in the document. We distinguish tags that have a semantic potential. We present results of an experiment we conducted on a set of Wikipedia pages. It shows that enhancement yields better results than using ATR algorithms alone.