Advanced learning algorithms for cross-language patent retrieval and classification

Authors:
Yaoyong Li;John Shawe-Taylor
Affiliations:
Department of Computer Science, The University of Sheffield, Regent Court, 211, Portobello Street, Sheffield S1 4DP, UK;Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK
Venue:
Information Processing and Management: an International Journal
Year:
2007

Citing 13
Cited 14

An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Latent Semantic Kernels

Journal of Intelligent Information Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
The Perceptron Algorithm with Uneven Margins

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
An empirical study on retrieval models for different document genres: patents and newspaper articles

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Cross-language text classification

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An EM Based Training Algorithm for Cross-Language Text Categorization

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Overview of patent retrieval task at NTCIR-3

PATENT '03 Proceedings of the ACL-2003 workshop on Patent corpus processing - Volume 20
Canonical Correlation Analysis: An Overview with Application to Learning Methods

Neural Computation
Using KCCA for Japanese---English cross-language information retrieval and document classification

Journal of Intelligent Information Systems
Cross language text categorization by acquiring multilingual domain models from comparable corpora

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Support vector machine to synthesise kernels

Proceedings of the First international conference on Deterministic and Statistical Methods in Machine Learning

Can chinese web pages be classified with english data source?

Proceedings of the 17th international conference on World Wide Web
An empirical study of required dimensionality for large-scale latent semantic indexing applications

Proceedings of the 17th ACM conference on Information and knowledge management
Cross-lingual query classification: a preliminary study

Proceedings of the 2nd ACM workshop on Improving non english web searching
Cross-language query classification using web search for exogenous knowledge

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Development of a multilingual text mining approach for knowledge discovery in patents

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Patent classification system using a new hybrid genetic algorithm support vector machine

Applied Soft Computing
Cross-language text classification using structural correspondence learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Retrieving information across multiple, related domains based on user query and feedback: application to patent laws and regulations

Proceedings of the 4th International Conference on Theory and Practice of Electronic Governance
Cross-Lingual Adaptation Using Structural Correspondence Learning

ACM Transactions on Intelligent Systems and Technology (TIST)
Generalized canonical correlation analysis for disparate data fusion

Pattern Recognition Letters
A patent system ontology for facilitating retrieval of patent related information

Proceedings of the 6th International Conference on Theory and Practice of Electronic Governance
Efficiency investigation of manifold matching for text document classification

Pattern Recognition Letters
Analyzing multilingual knowledge innovation in patents

Expert Systems with Applications: An International Journal
Cross-language patent matching via an international patent classification-based concept bridge

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study several machine learning algorithms for cross-language patent retrieval and classification. In comparison with most of other studies involving machine learning for cross-language information retrieval, which basically used learning techniques for monolingual sub-tasks, our learning algorithms exploit the bilingual training documents and learn a semantic representation from them. We study Japanese-English cross-language patent retrieval using Kernel Canonical Correlation Analysis (KCCA), a method of correlating linear relationships between two variables in kernel defined feature spaces. The results are quite encouraging and are significantly better than those obtained by other state of the art methods. We also investigate learning algorithms for cross-language document classification. The learning algorithm are based on KCCA and Support Vector Machines (SVM). In particular, we study two ways of combining the KCCA and SVM and found that one particular combination called SVM_2k achieved better results than other learning algorithms for either bilingual or monolingual test documents.