Mining monolingual and bilingual corpora

Authors:
Chiraz Latiri;Kamel Smaïli;Caroline Lavecchia;David Langlois
Affiliations:
(Correspd. E-mail: chiraz.latiri@gnet.tn) URPAH Team, Computer Sciences Department, Faculty of Sciences of Tunis, El Manar University, Tunisia;LORIA, Speech Group, Vandoeuvre, France;LORIA, Speech Group, Vandoeuvre, France;LORIA, Speech Group, Vandoeuvre, France and IUFM de Lorraine, France
Venue:
Intelligent Data Analysis
Year:
2010

Citing 18
Cited 2

A Cache-Based Natural Language Model for Speech Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Computing iceberg concept lattices with TITANIC

Data & Knowledge Engineering
Discovering Frequent Closed Itemsets for Association Rules

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Selection criteria for word trigger pairs in language modelling

ICG! '96 Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Term Similarity-Based Query Expansion for Cross-Language Information Retrieval

ECDL '99 Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries
Mining Minimal Non-redundant Association Rules Using Frequent Closed Itemsets

CL '00 Proceedings of the First International Conference on Computational Logic
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Mining Non-Redundant Association Rules

Data Mining and Knowledge Discovery
Lexical triggers and latent semantic analysis for cross-lingual language model adaptation

ACM Transactions on Asian Language Information Processing (TALIP)
Generating a Condensed Representation for Association Rules

Journal of Intelligent Information Systems
Relative risk and odds ratio: a data mining perspective

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Redundant association rules reduction techniques

International Journal of Business Intelligence and Data Mining
A new generic basis of "factual" and "implicative" association rules

Intelligent Data Analysis
Prince: an algorithm for generating rule bases without closure computations

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery

Proxemic conceptual network based on ontology enrichment for representing documents in IR

EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
Key roles of closed sets and minimal generators in concise representations of frequent patterns

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe two new methods of mining monolingual and bilingual text corpora that heavily rely on the use of association rules and triggers. The association rules based method is firstly applied in query expansion. The conducted experiments on French newspapers and on a set of scientific documents show that the proposed approach outperforms the baseline model. The second method focuses on the machine translation and is motivated by the results of triggers on statistical language modeling. In order to build up a translation table, association rules and triggers are then generalized to mine bilingual corpora. In this respect, we propose respectively the concepts of inter-lingual association rules and inter-lingual triggers. Both methods have been integrated in a real statistical machine translation. Carried out experiments highlight the practical feasibility of the introduced approaches in the context of machine translation and show that inter-lingual triggers achieve better results than those obtained using the third IBM model.