A text mining approach for automatic construction of hypertexts

Authors:
Hsin-Chang Yang;Chung-Hong Lee
Affiliations:
Department of Information Management, Chang Jung University, Tainan 711, Taiwan, ROC;Department of Electrical Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan, ROC
Venue:
Expert Systems with Applications: An International Journal
Year:
2005

Citing 20
Cited 9

Incremental clustering for dynamic information processing

ACM Transactions on Information Systems (TOIS)
Automatic structuring of text files

Electronic Publishing—Origination, Dissemination, and Design
Approaches to passage retrieval in full text information systems

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
A methodology for the automatic construction of a hypertext for information retrieval

SAC '93 Proceedings of the 1993 ACM/SIGAPP symposium on Applied computing: states of the art and practice
Automatic structuring and retrieval of large text files

Communications of the ACM
From text to hypertext by indexing

ACM Transactions on Information Systems (TOIS)
Design and implementation of a tool for the automatic construction of hypertexts for information retrieval

Information Processing and Management: an International Journal - Special issue on history of information science
Automatic hypertext link typing

Proceedings of the the seventh ACM conference on Hypertext
On the use of information retrieval techniques for the automatic construction of hypertext

Information Processing and Management: an International Journal - Special issue: methods and tools for the automatic construction of hypertext
Building hypertext using information retrieval

Information Processing and Management: an International Journal - Special issue: methods and tools for the automatic construction of hypertext
Navigation via similarity: automatic linking based on semantic closeness

Information Processing and Management: an International Journal - Special issue: methods and tools for the automatic construction of hypertext
Hypertext construction using statistical and semantic similarity

DL '97 Proceedings of the second ACM international conference on Digital libraries
Aspects of text semantics in hypertext

Proceedings of the tenth ACM Conference on Hypertext and hypermedia : returning to our diverse roots: returning to our diverse roots
A Web text mining approach based on self-organizing map

Proceedings of the 2nd international workshop on Web information and data management
Lexical semantics and automatic hypertext construction

ACM Computing Surveys (CSUR)
Automatic link generation

ACM Computing Surveys (CSUR)
On the Automatic Generation of Content Links in Hypertext

On the Automatic Generation of Content Links in Hypertext
Automatic construction of hypertexts for self-referencing: the hyper-textbook project

Information Systems
A method of automatic hypertext construction from an encyclopedic dictionary of a specific field

ANLC '92 Proceedings of the third conference on Applied natural language processing
NHS: a tool for the automatic construction of news hypertext

IRSG'98 Proceedings of the 20th Annual BCS-IRSG conference on Information Retrieval Research

A fuzzy clustering approach for finding similar documents using a novel similarity measure

Expert Systems with Applications: An International Journal
Seeding the survey and analysis of research literature with text mining

Expert Systems with Applications: An International Journal
A new approach on search for similar documents with multiple categories using fuzzy clustering

Expert Systems with Applications: An International Journal
Mining the text information to optimizing the customer relationship management

Expert Systems with Applications: An International Journal
Construction of supervised and unsupervised learning systems for multilingual text categorization

Expert Systems with Applications: An International Journal
Research intelligence involving information retrieval - An example of conferences and journals

Expert Systems with Applications: An International Journal
Applying text and data mining techniques to forecasting the trend of petitions filed to e-People

Expert Systems with Applications: An International Journal
A clustering study of a 7000 EU document inventory using MDS and SOM

Expert Systems with Applications: An International Journal
An evaluation framework for cross-lingual link discovery

Information Processing and Management: an International Journal

Quantified Score

Hi-index	12.06

Visualization

Abstract

The research on automatic hypertext construction emerges rapidly in the last decade because there exists a urgent need to translate the gigantic amount of legacy documents into web pages. Unlike traditional 'flat' texts, a hypertext contains a number of navigational hyperlinks that point to some related hypertexts or locations of the same hypertext. Traditionally, these hyperlinks were constructed by the creators of the web pages with or without the help of some authoring tools. However, the gigantic amount of documents produced each day prevent from such manual construction. Thus an automatic hypertext construction method is necessary for content providers to efficiently produce adequate information that can be used by web surfers. Although most of the web pages contain a number of non-textual data such as images, sounds, and video clips, text data still contribute the major part of information about the pages. Therefore, it is not surprising that most of automatic hypertext construction methods inherit from traditional information retrieval research. In this work, we will propose a new automatic hypertext construction method based on a text mining approach. Our method applies the self-organizing map algorithm to cluster some at text documents in a training corpus and generate two maps. We then use these maps to identify the sources and destinations of some important hyperlinks within these training documents. The constructed hyperlinks are then inserted into the training documents to translate them into hypertext form. Such translated documents will form the new corpus. Incoming documents can also be translated into hypertext form and added to the corpus through the same approach. Our method had been tested on a set of at text documents collected from a newswire site. Although we only use Chinese text documents, our approach can be applied to any documents that can be transformed to a set of index terms.