The nature of statistical learning theory
The nature of statistical learning theory
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Using analytic QP and sparseness to speed training of support vector machines
Proceedings of the 1998 conference on Advances in neural information processing systems II
Efficient identification of Web communities
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature Selection in Web Applications By ROC Inflections and Powerset Pruning
SAINT '01 Proceedings of the 2001 Symposium on Applications and the Internet (SAINT 2001)
Improving Category Specific Web Search by Learning Query Modifications
SAINT '01 Proceedings of the 2001 Symposium on Applications and the Internet (SAINT 2001)
Using extra-topical user preferences to improve web-based metasearch
Using extra-topical user preferences to improve web-based metasearch
LearnMiner: deductive, tolerant agents for discovering didactic resources on the web
SEKE '02 Proceedings of the 14th international conference on Software engineering and knowledge engineering
Inferring hierarchical descriptions
Proceedings of the eleventh international conference on Information and knowledge management
Web classification using support vector machine
Proceedings of the 4th international workshop on Web information and data management
WWW '03 Proceedings of the 12th international conference on World Wide Web
Building a web thesaurus from web link structure
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The connectivity sonar: detecting site functionality by structural patterns
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Statistical Relational Learning for Document Mining
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
THESUS: Organizing Web document collections based on link semantics
The VLDB Journal — The International Journal on Very Large Data Bases
Combining link-based and content-based methods for web document classification
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Fine-grained, structured configuration management for web projects
Proceedings of the 13th international conference on World Wide Web
Towards the self-annotating web
Proceedings of the 13th international conference on World Wide Web
PageCluster: Mining conceptual link hierarchies from Web log files for adaptive Web site navigation
ACM Transactions on Internet Technology (TOIT)
Web-page classification through summarization
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A unified model of literal mining and link analysis for ranking web resources
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Web page summarization using dynamic content
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Extracting Precise Link Context Using NLP Parsing Technique
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
GE-CKO: A Method to Optimize Composite Kernels for Web Page Classification
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Using web structure and summarisation techniques for web content mining
Information Processing and Management: an International Journal
Clustering web pages based on their structure
Data & Knowledge Engineering - Special issue: WIDM 2003
Intelligent GP fusion from multiple sources for text classification
Proceedings of the 14th ACM international conference on Information and knowledge management
Link Contexts in Classifier-Guided Topical Crawlers
IEEE Transactions on Knowledge and Data Engineering
WebGuard: A Web Filtering Engine Combining Textual, Structural, and Visual Content-Based Analysis
IEEE Transactions on Knowledge and Data Engineering
Exploiting structural information for semi-structured document categorization
Information Processing and Management: an International Journal
Reinforcing Web-object Categorization Through Interrelationships
Data Mining and Knowledge Discovery
Towards automated customer self-help
BT Technology Journal
A comparison of implicit and explicit links for web page classification
Proceedings of the 15th international conference on World Wide Web
A comparative study of citations and links in document classification
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Categorizing web search results into meaningful and stable categories using fast-feature techniques
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Automatically labeling hierarchical clusters
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Knowing a web page by the company it keeps
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A Voting Method for the Classification of Web Pages
WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Temporal multi-page summarization
Web Intelligence and Agent Systems
Noise reduction through summarization for Web-page classification
Information Processing and Management: an International Journal
Floatcascade learning for fast imbalanced web mining
Proceedings of the 17th international conference on World Wide Web
Identifying a hierarchy of bipartite subgraphs for web site abstraction
Web Intelligence and Agent Systems
Classifiers without borders: incorporating fielded text from neighboring web pages
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Extraction and classification of dense implicit communities in the Web graph
ACM Transactions on the Web (TWEB)
A framework to derive web page context from hyperlink structure
International Journal of Information and Communication Technology
Accelerating Web Content Filtering by the Early Decision Algorithm
IEICE - Transactions on Information and Systems
HITS algorithm improvement using anchor-related text extracted by DOM structure analysis
Proceedings of the 2009 ACM symposium on Applied Computing
PathRank: Web Page Retrieval with Navigation Path
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Hypertext classification to filtrate information on the web
Proceedings of the 2009 Euro American Conference on Telematics and Information Systems: New Opportunities to increase Digital Citizenship
Ontology based Text Annotation --OnTeA
Proceedings of the 2007 conference on Information Modelling and Knowledge Bases XVIII
Getting the most out of social annotations for web page classification
Proceedings of the 9th ACM symposium on Document engineering
Managing knowledge on the Web - Extracting ontology from HTML Web
Decision Support Systems
Serving Comparative Shopping Links Non-invasively
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
sDoc: exploring social wisdom for document enhancement in web mining
Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting term relationship to boost text classification
Proceedings of the 18th ACM conference on Information and knowledge management
Novel web page classification techniques in contextual advertising
Proceedings of the eleventh international workshop on Web information and data management
Concept-Based, Personalized Web Information Gathering: A Survey
KSEM '09 Proceedings of the 3rd International Conference on Knowledge Science, Engineering and Management
Using Web structure and summarisation techniques for Web content mining
Information Processing and Management: an International Journal
Exploiting structural information for semi-structured document categorization
Information Processing and Management: an International Journal
HITS algorithm improvement using semantic text portion
Web Intelligence and Agent Systems
MCS'03 Proceedings of the 4th international conference on Multiple classifier systems
Extraction of anchor-related text and its evaluation by user studies
Proceedings of the 2007 conference on Human interface: Part I
Document clustering of scientific texts using citation contexts
Information Retrieval
Empowering automatic semantic annotation in grid
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Design of SMACA: synthesis and its analysis through rule vector graph for web based application
International Journal of Intelligent Information and Database Systems
Mining the web with hierarchical crawlers – a resource sharing based crawling approach
International Journal of Intelligent Information and Database Systems
A knowledge-based model using ontologies for personalized web information gathering
Web Intelligence and Agent Systems
Classifying documents with link-based bibliometric measures
Information Retrieval
Information retrieval in structured domains
ADC '09 Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92
WIA: a web inspection architecture
International Journal of Knowledge and Web Intelligence
Web driving: an image-based opportunistic web browser that visualizes a peripheral information space
WISE'06 Proceedings of the 7th international conference on Web Information Systems
A PDD-Based searching approach for expert finding in intranet information management
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Classifying web data in directory structures
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Hierarchical web structuring from the web as a graph approach with repetitive cycle proof
APWeb'06 Proceedings of the 2006 international conference on Advanced Web and Network Technologies, and Applications
A novel web page categorization algorithm based on block propagation using query-log information
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Importance of HTML structural elements and metadata in automated subject classification
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Literal-matching-biased link analysis
AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
A path-based approach for web page retrieval
World Wide Web
WebDriving: web browsing based on a driving metaphor for improved children's e-learning
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Search engine indexing storage optimisation using Hamming distance
International Journal of Intelligent Information and Database Systems
ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities
Category labelling for automatic classification scheme generation
FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
Search for minority information from wikipedia based on similarity of majority information
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Web classification of conceptual entities using co-training
Expert Systems with Applications: An International Journal
Improving MeSH classification of biomedical articles using citation contexts
Journal of Biomedical Informatics
Extracting information networks from the blogosphere
ACM Transactions on the Web (TWEB)
Generation of SMACA and its application in web services
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Computing geographical serving area based on search logs and website categorization
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Extract and rank web communities
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
Cross-lingual web spam classification
Proceedings of the 22nd international conference on World Wide Web companion
Web Intelligence and Agent Systems
Hi-index | 0.00 |
The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents (documents that link to the target document) for search. We analyze the relative utility of document text, and the text in citing documents near the citation, for classification and description. Results show that the text in citing documents, when available, often has greater discriminative and descriptive power than the text in the target document itself. The combination of evidence from a document and citing documents can improve on either information source alone. Moreover, by ranking words and phrases in the citing documents according to expected entropy loss, we are able to accurately name clusters of web pages, even with very few positive examples. Our results confirm, quantify, and extend previous research using web structure in these areas, introducing new methods for classification and description of pages.