Enhancing automatic term recognition algorithms with HTML tags processing

Authors:
Milan Lučanský;Marián Šimko;Mária Bieliková
Affiliations:
Slovak University of Technology, Ilkovičova, Bratislava, Slovakia;Slovak University of Technology, Ilkovičova, Bratislava, Slovakia;Slovak University of Technology, Ilkovičova, Bratislava, Slovakia
Venue:
Proceedings of the 12th International Conference on Computer Systems and Technologies
Year:
2011

Citing 5
Cited 1

Word association norms, mutual information, and lexicography

Computational Linguistics
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Do HTML Tags Flag Semantic Content?

IEEE Internet Computing
Glossary extraction and utilization in the information search and delivery system for IBM technical support

IBM Systems Journal
Helping Courseware Authors to Build Ontologies: The Case of TM4L

Proceedings of the 2007 conference on Artificial Intelligence in Education: Building Technology Rich Learning Contexts That Work

Named entity disambiguation based on explicit semantics

SOFSEM'12 Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

We focus on mining relevant information from web pages. Unlike plain text documents, web pages contain another source of potentially relevant information - easily processable mark-up. We propose an approach to keyword extraction that enhances Automatic Term Recognition (ATR) algorithms intended for processing plain text documents with an analysis of HTML tags present in the document. We distinguish tags that have a semantic potential. We present results of an experiment we conducted on a set of Wikipedia pages. It shows that enhancement yields better results than using ATR algorithms alone.