Data mining for hypertext: a tutorial survey

Authors:
Soumen Chakrabarti
Affiliations:
Indian Institute of Technology Bombay
Venue:
ACM SIGKDD Explorations Newsletter
Year:
2000

Citing 42
Cited 46

Conceptual structures: information processing in mind and machine

Conceptual structures: information processing in mind and machine
Algorithms for clustering data

Algorithms for clustering data
The Johnson-Lindenstrauss Lemma and the sphericity of some graphs

Journal of Combinatorial Theory Series A
Automatic text processing

Automatic text processing
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Elements of information theory

Elements of information theory
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Constant interaction-time scatter/gather browsing of very large document collections

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
An extended vector-processing scheme for searching information in hypertext systems

Information Processing and Management: an International Journal
Automatic hypertext link typing

Proceedings of the the seventh ACM conference on Hypertext
Projections for efficient document clustering

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Cat-a-Cone: an interactive interface for specifying searches and viewing retrieval results using a large category hierarchy

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Lore: a database management system for semistructured data

ACM SIGMOD Record
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
A technique for measuring the relative size and overlap of public Web search engines

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The connectivity server: fast access to linkage information on the Web

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Making large-scale support vector machine learning practical

Advances in kernel methods
Finding related pages in the World Wide Web

WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
The role of lexicalization and pruning for base noun phrase grammars

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Natural Language Processing in LISP: An Introduction to Computational Linguistics

Natural Language Processing in LISP: An Introduction to Computational Linguistics
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Data Mining and Knowledge Discovery
Bayesian Networks for Data Mining

Data Mining and Knowledge Discovery
Mining the Web's Link Structure

Computer
First-Order Learning for Web Mining

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Using Reinforcement Learning to Spider the Web Efficiently

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
What Do Those Weird XML Types Want, Anyway?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Machine Learning Approach to Building Domain-Specific Search Engines

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies

The VLDB Journal — The International Journal on Very Large Data Bases
Hypertext versions of journal articles: computer-aided linking and realistic human-based evaluation

Hypertext versions of journal articles: computer-aided linking and realistic human-based evaluation
Language As a Cognitive Process: Syntax

Language As a Cognitive Process: Syntax

Web mining research: a survey

ACM SIGKDD Explorations Newsletter
Concept-based knowledge discovery in texts extracted from the Web

ACM SIGKDD Explorations Newsletter
Resource description framework: metadata and its applications

ACM SIGKDD Explorations Newsletter
Searching with numbers

Proceedings of the 11th international conference on World Wide Web
Data-driven evolution of data mining algorithms

Communications of the ACM - Evolving data mining into solutions for insights
Learning to play strong poker

Machines that learn to play games
Improving WWW Access-from Single-Purpose Systems to Agent Architectures?

AIMSA '00 Proceedings of the 9th International Conference on Artificial Intelligence: Methodology, Systems, and Applications
Towards Semantic Web Mining

ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
Frequent term-based text clustering

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining newsgroups using networks arising from social behavior

WWW '03 Proceedings of the 12th international conference on World Wide Web
Critical and future trends in data mining: a review of key data mining technologies/applications

Data mining
Web Usage Mining as a Tool for Personalization: A Survey

User Modeling and User-Adapted Interaction
A Dynamic Adaptive Self-Organising Hybrid Model for Text Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
LinkSelector: A Web mining approach to hyperlink selection for Web portals

ACM Transactions on Internet Technology (TOIT)
Average-clicks: a new measure of distance on the World Wide Web

Journal of Intelligent Information Systems - Special issue on web intelligence
Web Mining: Research and Practice

Computing in Science and Engineering
Designing a better web portal for digital government: a web-mining based approach

dg.o '05 Proceedings of the 2005 national conference on Digital government research
A process of knowledge discovery from web log data: Systematization and critical review

Journal of Intelligent Information Systems
Neural Network Based Document Clustering Using WordNet Ontologies

International Journal of Hybrid Intelligent Systems
Web outlier mining: Discovering outliers from web datasets

Intelligent Data Analysis
ServiceFinder: A method towards enhancing service portals

ACM Transactions on Information Systems (TOIS)
TaxaMiner: an experimentation framework for automated taxonomy bootstrapping

International Journal of Web and Grid Services
Clustering techniques utilized in web usage mining

AIKED'06 Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases
A new algorithm for term weighting in text summarization process

AIC'06 Proceedings of the 6th WSEAS International Conference on Applied Informatics and Communications
Web Usage Mining Via Fuzzy Logic Techniques

IFSA '07 Proceedings of the 12th international Fuzzy Systems Association world congress on Foundations of Fuzzy Logic and Soft Computing
Metadata domain-knowledge driven search engine in "HyperManyMedia" E-learning resources

CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)
Class dependent feature scaling method using naive Bayes classifier for text datamining

Pattern Recognition Letters
Web site topic-hierarchy generation based on link structure

Journal of the American Society for Information Science and Technology
Adaptive Web SitesA Knowledge Extraction from Web Data Approach

Proceedings of the 2008 conference on Adaptive Web Sites: A Knowledge Extraction from Web Data Approach
Intent based clustering of search engine query log

CASE'09 Proceedings of the fifth annual IEEE international conference on Automation science and engineering
Metadata as seeds for building an ontology driven information retrieval system

International Journal of Hybrid Intelligent Systems
Semantic Web Mining

Web Semantics: Science, Services and Agents on the World Wide Web
Framework for building a high-quality web page collection considering page group structure

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Analysis of log files applying mining techniques and fuzzy logic

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Combining resemblance functions for ontology alignment

Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services
A solution to the exact match on rare item searches: introducing the lost sheep algorithm

Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Dynamic resource scheduling and workflow management in cloud computing

WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
A Fuzzy Logic intelligent agent for Information Extraction: Introducing a new Fuzzy Logic-based term weighting scheme

Expert Systems with Applications: An International Journal
Using SOFM to improve web site text content

ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part II
Fuzzy-neuro web-based multilingual knowledge management

FSKD'06 Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery
A personalized multilingual web content miner: PMWebMiner

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
Web site off-line structure reconfiguration: a web user browsing analysis

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Hybrid approach to web content outlier mining without query vector

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Conceptual classification to improve a web site content

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Towards automatic assessment of government web sites

Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics

Quantified Score

Hi-index	0.00

Visualization

Abstract

With over 800 million pages covering most areas of human endeavor, the World-wide Web is a fertile ground for data mining research to make a difference to the effectiveness of information search. Today, Web surfers access the Web through two dominant interfaces: clicking on hyperlinks and searching via keyword queries. This process is often tentative and unsatisfactory. Better support is needed for expressing one's information need and dealing with a search result in more structured ways than available now. Data mining and machine learning have significant roles to play towards this end.In this paper we will survey recent advances in learning and mining problems related to hypertext in general and the Web in particular. We will review the continuum of supervised to semi-supervised to unsupervised learning problems, highlight the specific challenges which distinguish data mining in the hypertext domain from data mining in the context of data warehouses, and summarize the key areas of recent and ongoing research.