Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
GIPSY: automated geographic indexing of text documents
Journal of the American Society for Information Science - Special issue: spatial information
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Computing Geographical Scopes of Web Resources
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
GeoVSM: An Integrated Retrieval Model for Geographic Information
GIScience '02 Proceedings of the Second International Conference on Geographic Information Science
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Categorizing web queries according to geographical locality
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Web-a-where: geotagging web content
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
A confidence-based framework for disambiguating geographic terms
HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
Knowing a web page by the company it keeps
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Disambiguating toponyms in news
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Feature selection methods for text classification
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A query-aware document ranking method for geographic information retrieval
Proceedings of the 4th ACM workshop on Geographical information retrieval
Modeling and visualizing geo-sensitive queries based on user clicks
Proceedings of the first international workshop on Location and the web
Geographic intention and modification in web search
International Journal of Geographical Information Science
Web page language identification based on URLs
Proceedings of the VLDB Endowment
Purely URL-based topic classification
Proceedings of the 18th international conference on World wide web
Adaptive context features for toponym resolution in streaming news
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A Comprehensive Study of Techniques for URL-Based Web Page Language Classification
ACM Transactions on the Web (TWEB)
Evidential location estimation for events detected in Twitter
Proceedings of the 7th Workshop on Geographic Information Retrieval
Hi-index | 0.00 |
This paper presents an approach for categorizing documents according to their implicit locational relevance. We report a thorough evaluation of several classifiers designed for this task, built by using support vector machines with multiple alternatives for feature vectors. Experimental results show that using feature vectors that combine document terms and URL n-grams, with simple features related to the locality of the document (e.g. total count of place references) leads to high accuracy values. The paper also discusses how the proposed categorization approach can be used to help improve tasks such as document retrieval or online contextual advertisement.