News

[Announcement] Prof. Anoop Sarkar's talk: Two Methods for Morpheme-based Machine Translation

News Announcements Published June 14, 2012

Prof. Anoop Sarkar from Simon Fraser University will give a talk at KAIST on July 2nd as follows:

Title: Two Methods for Morpheme-based Machine Translation
Speaker: Professor Anoop Sarkar (joint work with Ann Clifton and Young-chan Kim)
Affiliation: School of Computing Science, Simon Fraser University, BC (Canada)
Time: July 2 (Monday), 2012, 2:00 PM ~ 3:00 PM
Place: Oh Sang-Su Seminar Room, CS Building (#4443, E3-1)
Host: Professor Jong C. Park (박종철 교수)

Abstract:
Statistical machine translation systems learn how to translate by training on large amounts of previously translated text. The machine learning models used typically assume that the unit of translation is pre-defined (defined by an observed word boundary). As a result, these methods tend to perform poorly when translating into languages like Finnish or Korean with very complex morphological systems with a large vocabulary.

This talk is about our ongoing work in the use of unsupervised morpheme segmentation methods for machine translation. We investigate four methods: a generative model for generating word forms during translation (factored translation), translation based on word segments (sub-word translation), probability models for generating word forms in the target language (morphology generation) and sub-word alignment models that infer morphemes that can improve alignment between source and target languages (sub-word alignment).

The first three were evaluated on English to Finnish and the last method on English to Korean translation.

We find that morphology aware translation models yield significantly more fluent translations compared to a state of the art word-based baseline. We perform linguistic analysis of the output to show that morpheme-aware translations are more fluent and we show improvements in automatic evaluation scores like BLEU.

Biography:
Anoop Sarkar is an Associate Professor in Computing Science at Simon Fraser University in British Columbia, Canada where he co-directs the Natural Language Laboratory (http://natlang.cs.sfu.ca). He received his Ph.D. from the Department of Computer and Information Sciences at the University of Pennsylvania under Prof. Aravind Joshi for his work on semi-supervised statistical parsing using tree-adjoining grammars. His favorite machine translation system is kriya, the hierarchical phrase-based system developed in his lab at SFU.

His research is focused on statistical parsing and machine translation (exploiting syntax or morphology, or both). His interests also include semi-supervised learning algorithms and stochastic grammars, in particular tree automata and tree-adjoining grammars.

http://www.cs.sfu.ca/~anoop

Contact: Hye-Jin Min (hjmin@nlp.kaist.ac.kr, T. 7741)