The nature of statistical learning theory
The nature of statistical learning theory
Elements of machine learning
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 10th international conference on World Wide Web
Learning to map between ontologies on the semantic web
Proceedings of the 11th international conference on World Wide Web
Topic Detection and Tracking: Event-Based Information Organization
Topic Detection and Tracking: Event-Based Information Organization
Information Retrieval
Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A scalability analysis of classifiers in text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Computational Linguistics - Special issue on web as corpus
Embedding web-based statistical translation models in cross-language information retrieval
Computational Linguistics - Special issue on web as corpus
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A bootstrapping method for extracting bilingual text pairs
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Resource selection for domain-specific cross-lingual IR
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Effect of cross-language IR in bilingual lexicon acquisition from comparable corpora
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Reliable measures for aligning Japanese-English news articles and sentences
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
ACM SIGMOD Record
The effect of translation quality in MT-based cross-language information retrieval
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Topic segmentation with shared topic detection and alignment of multiple documents
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Integrating multiple internet directories by instance-based learning
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Hi-index | 0.00 |
Internet directories such as Yahoo! are an approach to improvethe efficacy and efficiency of Information Retrieval (IR) on theWeb, as pages (documents) are organized into hierarchicalcategories, and similar pages are grouped together. Most of thesearch engines on the Web service find documents that are assignedto a single classification hierarchy. Categories in the hierarchyare carefully defined by human experts and documents are wellorganized. However, a single hierarchy in one language is ofteninsufficient to find all relevant material, as each hierarchy tendsto have some bias in both defining hierarchical structure andclassifying documents. Moreover, documents written in a languageother than the users native language often include large amounts ofinformation related to the users request. In this article, wepropose a method of integrating cross-language (CL) categoryhierarchies, that is, Reuters 96 hierarchy and UDC code hierarchyof Japanese by estimating category similarities. The method doesnot simply merge two different hierarchies into one large hierarchybut instead extracts sets of similar categories, where each elementof the sets is relevant with each other. It consists of threesteps. First, we classify documents from one hierarchy intocategories with another hierarchy using a cross-language textclassification (CLTC) technique, and extract category pairs of twohierarchies. Next, we apply Ç2 statisticsto these pairs to obtain similar category pairs, and finally weapply the generating function of the Apriori algorithm(Apriori-Gen) to the category pairs, and find sets of similarcategories. Moreover, we examined whether integrating hierarchieshelps to support retrieval of documents with similar contents. Theretrieval results showed a 42.7% improvement over the baselinenonhierarchy model, and a 21.6% improvement over a singlehierarchy.