Determining the titles of Web pages using anchor text and link analysis

Authors:
Ok-Ran Jeong;Jehwan Oh;Dong-Jin Kim;Heetae Lyu;Won Kim
Affiliations:
-;-;-;-;-
Venue:
Expert Systems with Applications: An International Journal
Year:
2014

Citing 17
Cited 0

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Knowledge-based metadata extraction from PostScript files

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Topical locality in the Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning for Information Extraction in Informal Domains

Machine Learning - Special issue on information retrieval
Probabilistic combination of content and links

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic metadata generation & evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The Perceptron Algorithm with Uneven Margins

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Link Based Clustering of Web Search Results

WAIM '01 Proceedings of the Second International Conference on Advances in Web-Age Information Management
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Analysis of anchor text for web search

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A Dynamic Feature Generation System for Automated Metadata Extraction in Preservation of Digital Materials

DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Metaextract: an NLP system to automatically assign metadata

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Title extraction from bodies of HTML documents and its application to web page retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Revisiting Lexical Signatures to (Re-)Discover Web Pages

ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Columbia Newsblaster: multilingual news summarization on the web

HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
A semi-supervised key phrase extraction approach: learning from title phrases through a document semantic network

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
A very efficient approach to news title and content extraction on the web

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries

Quantified Score

Hi-index	12.05

Visualization

Abstract

Determining the titles of Web pages is an important element in characterizing and categorizing the vast number of Web pages. There are a few approaches to automatically determining the titles of Web pages. As an R&D project for Naver, the operator of Naver (Korea's largest portal site), we developed a new method that makes use of anchor texts and analysis of links among Web pages. In this paper, we describe our method and show experiment results of its performance.