Classifying news stories using memory based reasoning
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Information extraction as a basis for high-precision text classification
ACM Transactions on Information Systems (TOIS)
Journal of the American Society for Information Science
The nature of statistical learning theory
The nature of statistical learning theory
Cluster-based text categorization: a comparison of category search strategies
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Internet agents: spiders, wanderers, brokers, and bots
Internet agents: spiders, wanderers, brokers, and bots
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
An interactive WWW search engine for user-defined collections
Proceedings of the third ACM conference on Digital libraries
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
An intelligent personal spider (agent) for dynamic Internet/intranet searching
Decision Support Systems - Special issue: intranets and intranetworking
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Making large-scale support vector machine learning practical
Advances in kernel methods
Context-sensitive learning methods for text categorization
ACM Transactions on Information Systems (TOIS)
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Yahoo! as an ontology: using Yahoo! categories to describe documents
Proceedings of the eighth international conference on Information and knowledge management
Comparing noun phrasing techniques for use with medical digital library tools
Journal of the American Society for Information Science - Special topic issue on digital libraries: part 2
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Greenstone: Open-source DL software
Communications of the ACM
ACM Transactions on Internet Technology (TOIT)
Information Retrieval
Bayesian online classifiers for text classification and filtering
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Mining the Web's Link Structure
Computer
Automatic Text Categorization and Its Application to Text Retrieval
IEEE Transactions on Knowledge and Data Engineering
ACIRD: Intelligent Internet Document Organization and Retrieval
IEEE Transactions on Knowledge and Data Engineering
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Composite Kernels for Hypertext Categorisation
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A Machine Learning Approach to Building Domain-Specific Search Engines
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Feature Reduction for Neural Network Based Text Categorization
DASFAA '99 Proceedings of the Sixth International Conference on Database Systems for Advanced Applications
HelpfulMed: intelligent searching for medical information over the internet
Journal of the American Society for Information Science and Technology
Building a scientific knowledge web portal: the NanoPort experience
Decision Support Systems
WebGlimpse: combining browsing and searching
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Incorporating Web Analysis Into Neural Networks: An Example in Hopfield Net Searching
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Using domain-specific knowledge in generalization error bounds for support vector machine learning
Decision Support Systems
Decision Support Systems
On strategies for imbalanced text classification using SVM: A comparative study
Decision Support Systems
Classification by vertical and cutting multi-hyperplane decision tree induction
Decision Support Systems
Commercial Internet filters: Perils and opportunities
Decision Support Systems
Visualizing web search results using glyphs: Design and evaluation of a flower metaphor
ACM Transactions on Management Information Systems (TMIS)
International Journal of Computational Science and Engineering
Mining search intents for collaborative cyberporn filtering
Journal of the American Society for Information Science and Technology
Constructing a reliable Web graph with information on browsing behavior
Decision Support Systems
E-FFC: an enhanced form-focused crawler for domain-specific deep web databases
Journal of Intelligent Information Systems
Editorial: A topic-specific crawling strategy based on semantics similarity
Data & Knowledge Engineering
Hi-index | 0.00 |
As the Web continues to grow, it has become increasingly difficult to search for relevant information using traditional search engines. Topic-specific search engines provide an alternative way to support efficient information retrieval on the Web by providing more precise and customized searching in various domains. However, developers of topic-specific search engines need to address two issues: how to locate relevant documents (URLs) on the Web and how to filter out irrelevant documents from a set of documents collected from the Web. This paper reports our research in addressing the second issue. We propose a machine-learning-based approach that combines Web content analysis and Web structure analysis. We represent each Web page by a set of content-based and link-based features, which can be used as the input for various machine learning algorithms. The proposed approach was implemented using both a feedforward/backpropagation neural network and a support vector machine. Two experiments were designed and conducted to compare the proposed Web-feature approach with two existing Web page filtering methods - a keyword-based approach and a lexicon-based approach. The experimental results showed that the proposed approach in general performed better than the benchmark approaches, especially when the number of training documents was small. The proposed approaches can be applied in topic-specific search engine development and other Web applications such as Web content management.