News
[Announcement] Prof. Anoop Sarkar's talk: Two Methods for Morpheme-based Machine Translation
News Announcements Published June 14, 2012
Prof. Anoop Sarkar from Simon Fraser University will give a talk at KAIST on July 2nd as follows:
Title: Two Methods for Morpheme-based Machine Translation
Speaker: Professor Anoop Sarkar (joint work with Ann Clifton and Young-chan Kim)
Affiliation: School of Computing Science, Simon Fraser University, BC (Canada)
Time: July 2 (Monday), 2012, 2:00 PM ~ 3:00 PM
Place: Oh Sang-Su Seminar Room, CS Building (#4443, E3-1)
Host: Professor Jong C. Park (박종철 교수)
Abstract:
Statistical machine translation systems learn how to translate by
training on large amounts of previously translated text. The machine
learning models used typically assume that the unit of translation
is pre-defined (defined by an observed word boundary). As a result,
these methods tend to perform poorly when translating into languages
like Finnish or Korean with very complex morphological systems with
a large vocabulary.
This talk is about our ongoing work in the use of unsupervised
morpheme segmentation methods for machine translation. We investigate
four methods: a generative model for generating word forms during
translation (factored translation), translation based on word
segments (sub-word translation), probability models for generating
word forms in the target language (morphology generation) and
sub-word alignment models that infer morphemes that can improve
alignment between source and target languages (sub-word alignment).
The first three were evaluated on English to Finnish and the last
method on English to Korean translation.
We find that morphology aware translation models yield significantly
more fluent translations compared to a state of the art word-based
baseline. We perform linguistic analysis of the output to show that
morpheme-aware translations are more fluent and we show improvements
in automatic evaluation scores like BLEU.
Biography:
Anoop Sarkar is an Associate Professor in Computing Science at Simon Fraser
University in British Columbia, Canada where he co-directs the Natural
Language Laboratory (http://natlang.cs.sfu.ca). He received his Ph.D. from
the Department of Computer and Information Sciences at the University of
Pennsylvania under Prof. Aravind Joshi for his work on semi-supervised
statistical parsing using tree-adjoining grammars. His favorite machine
translation system is kriya, the hierarchical phrase-based system developed
in his lab at SFU.
His research is focused on statistical parsing and machine translation
(exploiting syntax or morphology, or both). His interests also include
semi-supervised learning algorithms and stochastic grammars, in particular
tree automata and tree-adjoining grammars.
http://www.cs.sfu.ca/~anoop
Contact: Hye-Jin Min (hjmin@nlp.kaist.ac.kr, T. 7741)