Machine Learning
Quality information and knowledge
Quality information and knowledge
Communications of the ACM - Supporting community and building social capital
DNS and BIND
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
C4.5: Programs for Machine Learning
C4.5: Programs for Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Information Processing and Management: an International Journal - Modelling vagueness and subjectivity in information access
An evaluation of statistical spam filtering techniques
ACM Transactions on Asian Language Information Processing (TALIP)
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Journal of the American Society for Information Science and Technology
Beyond PageRank: machine learning for static ranking
Proceedings of the 15th international conference on World Wide Web
An empirical study of three machine learning methods for spam filtering
Knowledge-Based Systems
Know your neighbors: web spam detection using the web topology
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A fuzzy linguistic model to evaluate the quality of Web sites that store XML documents
International Journal of Approximate Reasoning
A comparison of machine learning techniques for phishing detection
Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Link analysis for Web spam detection
ACM Transactions on the Web (TWEB)
Using information gain to improve multi-modal information retrieval systems
Information Processing and Management: an International Journal
Mixed feature selection based on granulation and approximation
Knowledge-Based Systems
Link based small sample learning for web spam detection
Proceedings of the 18th international conference on World wide web
Nmap Network Scanning: The Official Nmap Project Guide to Network Discovery and Security Scanning
Nmap Network Scanning: The Official Nmap Project Guide to Network Discovery and Security Scanning
Beyond blacklists: learning to detect malicious web sites from suspicious URLs
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to Rank for Information Retrieval
Foundations and Trends in Information Retrieval
Phishing Infrastructure Fluxes All the Way
IEEE Security and Privacy
Learning to rank with document ranks and scores
Knowledge-Based Systems
CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites
ACM Transactions on Information and System Security (TISSEC)
Hi-index | 0.00 |
Cross-language Web content quality assessment plays an important role in many Web content processing applications. In the previous research, natural language processing, heuristic content and term frequency-inverse document frequency features based statistical systems have proven effective for Web content quality assessment. However, these are language-dependent features, which are not suitable for cross-language ranking. This paper proposes a cross-language Web content quality assessment method. First multi-modal language-independent features are extracted. The extracting features include character features, domain registration features, two-layer hyperlink analysis features and third-party Web service features. All the extracted features are then fused. Based on the fused features, feature selection is carried out to get a new eigenspace. Finally cross-language Web content quality model on the eigenspace can be learned. The experiments on ECML/PKDD 2010 Discovery Challenge cross-language datasets demonstrate that every scale feature has discriminability; different modalities of features are complementary to each other; and the feature selection is effective for statistical learning based cross-language Web content quality assessment.