Phrase Table Pruning by Modeling the Content of Phrases
Paper ID : 1506-IST
1Fatemeh Azadi *, 2shahram khadivi
1Amirkabir University of Technology
Many of the phrase pairs extracted in the phrase-based machine translation systems, have low quality and are not relevant. So their existence in the phrase table, not only enlarges it, but also could reduce the translation quality. There are many methods presented to prune these noisy phrase pairs, using the statistics derived from the phrase table. In this paper we proposed a new pruning method that unlike the other similar pruning approaches, use the content of each side of the phrase pair to estimate its relevance and quality. In order to model the content of phrases, the topic models have been used. With testing this new pruning method on a Farsi-English system, we could prune more than 50% of the phrase-table without significant loss or even improvements in the BLEU scores.
Phrase Based Statistical Machine Translation; phrase-table pruning; Topic Modeling; Farsi - English