A maximum entropy approach to natural language processing
Computational Linguistics
Web classification using support vector machine
Proceedings of the 4th international workshop on Web information and data management
Combining Statistical and Relational Methods for Learning in Hypertext Domains
ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
Using urls and table layout for web classification tasks
Proceedings of the 13th international conference on World Wide Web
Web page classification without the web page
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Blocking objectionable web content by leveraging multiple information sources
ACM SIGKDD Explorations Newsletter
Knowing a web page by the company it keeps
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Measuring similarity to detect qualified links
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Web page rank prediction with markov models
Proceedings of the 17th international conference on World Wide Web
Text Learning and Hierarchical Feature Selection in Webpage Classification
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Automatic Web Page Classification Using Various Features
PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Web Page Rank Prediction with PCA and EM Clustering
WAW '09 Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph
Purely URL-based topic classification
Proceedings of the 18th international conference on World wide web
Annotation of URLs: more than the sum of parts
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Weblog classification for fast splog filtering: a URL language model segmentation approach
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
PSNUS: web people name disambiguation by simple clustering with rich features
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Hypertext Classification Using Tensor Space Model and Rough Set Based Ensemble Classifier
PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Learning URL patterns for webpage de-duplication
Proceedings of the third ACM international conference on Web search and data mining
Tensor Framework and Combined Symmetry for Hypertext Mining
Fundamenta Informaticae
Framework for building a high-quality web page collection considering page group structure
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Building a scalable web query system
DNIS'07 Proceedings of the 5th international conference on Databases in networked information systems
Kairos: proactive harvesting of research paper metadata from scientific conference web sites
ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
Web page classification: a probabilistic model with relational uncertainty
IPMU'10 Proceedings of the Computational intelligence for knowledge-based systems design, and 13th international conference on Information processing and management of uncertainty
A novel split and merge technique for hypertext classification
Transactions on rough sets XII
Scalable information extraction for web queries
International Journal of Computational Science and Engineering
Learning to detect malicious URLs
ACM Transactions on Intelligent Systems and Technology (TIST)
Design and implementation of contextual information portals
Proceedings of the 20th international conference companion on World wide web
Web scale NLP: a case study on url word breaking
Proceedings of the 20th international conference on World wide web
The SHARC framework for data quality in Web archiving
The VLDB Journal — The International Journal on Very Large Data Bases
A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification
ACM Transactions on the Web (TWEB)
Using main content extraction to improve performance of Vietnamese web page classification
Proceedings of the Second Symposium on Information and Communication Technology
ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities
A statistical approach to URL-based web page clustering
Proceedings of the 21st international conference companion on World Wide Web
Visualizing digital collections at archive-it
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Web classification of conceptual entities using co-training
Expert Systems with Applications: An International Journal
Ranking importance based information on the world wide web
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Mining query subtopics from search log data
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Finding patterns in an unknown graph
AI Communications - The Symposium on Combinatorial Search
Tensor Framework and Combined Symmetry for Hypertext Mining
Fundamenta Informaticae
A novel focused crawler based on breadcrumb navigation
ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II
Domain specific search in indian languages
Proceedings of the first workshop on Information and knowledge management for developing region
Semantic Formalization of Cross-Site User Browsing Behavior
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Cost-sensitive online active learning with application to malicious URL detection
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Researcher homepage classification using unlabeled data
Proceedings of the 22nd international conference on World Wide Web
Web objectionable text content detection using topic modeling technique
Expert Systems with Applications: An International Journal
CALA: An unsupervised URL-based web page classification system
Knowledge-Based Systems
Hi-index | 0.00 |
We demonstrate the usefulness of the uniform resource locator (URL) alone in performing web page classification. This approach is faster than typical web page classification, as the pages do not have to be fetched and analyzed. Our approach segments the URL into meaningful chunks and adds component, sequential and orthographic features to model salient patterns. The resulting features are used in supervised maximum entropy modeling. We analyze our approach's effectiveness on two standardized domains. Our results show that in certain scenarios, URL-based methods approach the performance of current state-of-the-art full-text and link-based methods.