Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Bringing order to the Web: automatically categorizing search results
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Learning to construct knowledge bases from the World Wide Web
Artificial Intelligence - Special issue on Intelligent internet systems
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
Learning Logical Definitions from Relations
Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
First-Order Learning for Web Mining
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Composite Kernels for Hypertext Categorisation
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Multistrategy Learning for Information Extraction
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Combining Multiple Learning Strategies for Effective Cross Validation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Automatically Extracting Features for Concept Learning from the Web
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Discovering Test Set Regularities in Relational Domains
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Hypertext Categorization using Hyperlink Patterns and Meta Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Combining Statistical and Relational Methods for Learning in Hypertext Domains
ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
Predicting web actions from HTML content
Proceedings of the thirteenth ACM conference on Hypertext and hypermedia
Web classification using support vector machine
Proceedings of the 4th international workshop on Web information and data management
Improving Naive Bayes Using Class-Conditional ICA
IBERAMIA 2002 Proceedings of the 8th Ibero-American Conference on AI: Advances in Artificial Intelligence
A scalability analysis of classifiers in text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Learning probabilistic models of link structure
The Journal of Machine Learning Research
Web unit mining: finding and classifying subgraphs of web pages
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Combining link-based and content-based methods for web document classification
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Structured multimedia document classification
Proceedings of the 2003 ACM symposium on Document engineering
Link mining: a new data mining challenge
ACM SIGKDD Explorations Newsletter
An Analytical Approach to Concept Extraction in HTML Environments
Journal of Intelligent Information Systems
Improving text categorization using the importance of sentences
Information Processing and Management: an International Journal
Using the feature projection technique based on a normalized voting method for text classification
Information Processing and Management: an International Journal
An Evaluation of Passage-Based Text Categorization
Journal of Intelligent Information Systems
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Why collective inference improves relational classification
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Extracting Precise Link Context Using NLP Parsing Technique
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
GE-CKO: A Method to Optimize Composite Kernels for Web Page Classification
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Bayesian network model for semi-structured document classification
Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Mining web content outliers using structure oriented weighting techniques and N-grams
Proceedings of the 2005 ACM symposium on Applied computing
Text categorization using feature projections
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Automatic text categorization using the importance of sentences
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Using web structure and summarisation techniques for web content mining
Information Processing and Management: an International Journal
Intelligent GP fusion from multiple sources for text classification
Proceedings of the 14th ACM international conference on Information and knowledge management
WebGuard: A Web Filtering Engine Combining Textual, Structural, and Visual Content-Based Analysis
IEEE Transactions on Knowledge and Data Engineering
Exploiting structural information for semi-structured document categorization
Information Processing and Management: an International Journal
A comparative study of citations and links in document classification
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Dictionary-based text categorization of chemical web pages
Information Processing and Management: an International Journal
Latent linkage semantic kernels for collective classification of link data
Journal of Intelligent Information Systems
Higher order feature selection for text classification
Knowledge and Information Systems
Identifying ontology components from digital archives for the semantic web
ACST'06 Proceedings of the 2nd IASTED international conference on Advances in computer science and technology
Template extraction from candidate template set generation: a structure and content approach
Proceedings of the 43rd annual Southeast regional conference - Volume 2
Learning Contextual Dependency Network Models for Link-Based Classification
IEEE Transactions on Knowledge and Data Engineering
Multi-evidence, multi-criteria, lazy associative document classification
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Comparison of feature selection and classification algorithms in identifying malicious executables
Computational Statistics & Data Analysis
Clustering e-commerce search engines based on their search interface pages using WISE-cluster
Data & Knowledge Engineering - Special issue: WIDM 2004
User behavior modeling and content based speculative web page prefetching
Data & Knowledge Engineering - Special issue: ER 2003
Two-phase Web site classification based on Hidden Markov Tree models
Web Intelligence and Agent Systems
Information Processing and Management: an International Journal
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
A study of context inference for Web-based information systems
Electronic Commerce Research and Applications
A study of local and global thresholding techniques in text categorization
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Combining content and link for classification using matrix factorization
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A review of associative classification mining
The Knowledge Engineering Review
A machine learning approach to web page filtering using content and structure analysis
Decision Support Systems
Node roles and community structure in networks
Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Finding and classifying web units in websites
International Journal of Business Intelligence and Data Mining
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Classifiers without borders: incorporating fielded text from neighboring web pages
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic Recognition of News Web Pages
PAISI, PACCF and SOCO '08 Proceedings of the IEEE ISI 2008 PAISI, PACCF, and SOCO international workshops on Intelligence and Security Informatics
Weighted Hyper-sphere SVM for Hypertext Classification
ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks
Classifying networked entities with modularity kernels
Proceedings of the 17th ACM conference on Information and knowledge management
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Text classification from unlabeled documents with bootstrapping and feature projection techniques
Information Processing and Management: an International Journal
Generating Bidirectional Links for Web Annotation Stickies
IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
Getting the most out of social annotations for web page classification
Proceedings of the 9th ACM symposium on Document engineering
A comparison of fraud cues and classification methods for fake escrow website detection
Information Technology and Management
Web corpus mining by instance of Wikipedia
WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus
Improving web page classification by label-propagation over click graphs
Proceedings of the 18th ACM conference on Information and knowledge management
Journal of Management Information Systems
Using some web content mining techniques for Arabic text classification
DNCOCO'09 Proceedings of the 8th WSEAS international conference on Data networks, communications, computers
Hypertext Classification Using Tensor Space Model and Rough Set Based Ensemble Classifier
PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Using Web structure and summarisation techniques for Web content mining
Information Processing and Management: an International Journal
Exploiting structural information for semi-structured document categorization
Information Processing and Management: an International Journal
Tensor Framework and Combined Symmetry for Hypertext Mining
Fundamenta Informaticae
Web page classification: a soft computing approach
AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence
MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
Semantic-based grouping of search engine results using WordNet
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Text categorization of multilingual web pages in specific domain
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Text and hypertext categorization
Artificial intelligence
Classifying documents with link-based bibliometric measures
Information Retrieval
A novel split and merge technique for hypertext classification
Transactions on rough sets XII
Link-based text classification using Bayesian networks
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Document assignment in multi-site search engines
Proceedings of the fourth ACM international conference on Web search and data mining
Design and implementation of contextual information portals
Proceedings of the 20th international conference companion on World wide web
Improving text classification with concept index terms and expansion terms
ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part III
Combining file content and file relations for cloud based malware detection
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach
Proceedings of the 20th ACM international conference on Information and knowledge management
Discriminative probabilistic models for relational data
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Robust collective classification with contextual dependency network models
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
A novel framework for web page classification using two-stage neural network
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Importance of HTML structural elements and metadata in automated subject classification
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Towards logical hypertext structure
IICS'04 Proceedings of the 4th international conference on Innovative Internet Community Systems
Multi-lingual detection of terrorist content on the web
WISI'06 Proceedings of the 2006 international conference on Intelligence and Security Informatics
Web classification of conceptual entities using co-training
Expert Systems with Applications: An International Journal
Tensor Framework and Combined Symmetry for Hypertext Mining
Fundamenta Informaticae
Collective classification for fine-grained information status
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Computing geographical serving area based on search logs and website categorization
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Concept comparison engines: A new frontier of search
Decision Support Systems
What's buzzing in the blizzard of buzz? Automotive component isolation in social media postings
Decision Support Systems
Hi-index | 0.00 |
Hypertext poses new research challenges for text classification. Hyperlinks, HTML tags, category labels distributed over linked documents, and meta data extracted from related Web sites all provide rich information for classifying hypertext documents. How to appropriately represent that information and automatically learn statistical patterns for solving hypertext classification problems is an open question. This paper seeks a principled approach to providing the answers. Specifically, we define five hypertext regularities which may (or may not) hold in a particular application domain, and whose presence (or absence) may significantly influence the optimal design of a classifier. Using three hypertext datasets and three well-known learning algorithms (Naive Bayes, Nearest Neighbor, and First Order Inductive Learner), we examine these regularities in different domains, and compare alternative ways to exploit them. Our results show that the identification of hypertext regularities in the data and the selection of appropriate representations for hypertext in particular domains are crucial, but seldom obvious, in real-world problems. We find that adding the words in the linked neighborhood to the page having those links (both inlinks and outlinks) were helpful for all our classifiers on one data set, but more harmful than helpful for two out of the three classifiers on the remaining datasets. We also observed that extracting meta data from related Web sites was extremely useful for improving classification accuracy in some of those domains. Finally, the relative performance of the classifiers being tested provided insights into their strengths and limitations for solving classification problems involving diverse and often noisy Web pages.