Recent Publications

Publications The latest 10 papers published or under review

Neural Theorem Prover with Word Embedding for Efficient Automatic Annotation

Wonsuk Yang, Hancheol Park, and Jong C. Park
Journal of KIISE 2017. (under review)

Addressing low-resource problems in statistical machine translation of manual signals in sign language

Hancheol Park, Jung-Ho Kim, and Jong C. Park
Journal of KIISE, Vol. 44, No. 2, pp. 163-170, February, 2017.

Neural Theorem Prover with Word Embedding for Efficient Automatic Annotation

Wonsuk Yang, Hancheol Park, and Jong C. Park
Proceedings of the 28th Annual Conference on Human and Cognitive Language Technology (HCLT) pp. 79-84, Busan, Korea, October 07-08, 2016.
(selected as best paper)
Show abstract
蹂 뿰援щ뒗 쟾臾멸린愿뿉꽌 깮궛릺뒗 寃利앸맂 臾몄꽌瑜 쎒긽쓽 닔留롮 寃利앸릺吏 븡 臾몄꽌뿉 옄룞 二쇱꽍븯뿬 떊 猶곕룄 뼢긽 諛 떖솕 젙蹂대 옄룞쑝濡 異붽븯뒗 떆뒪뀥쓣 꽕怨꾪븯뒗 寃껋쓣 紐⑺몴濡 븳떎. 씠瑜 쐞빐 솢슜 媛뒫 븳 떆뒪뀥씤 씤怨 떊寃 젙由 利앸챸怨(neural theorem prover)媛 洹쒕え 留먮춬移섏뿉 쟻슜릺吏 븡뒗떎뒗 洹쇰낯 쟻씤 臾몄젣瑜 빐寃고븯湲 쐞빐 궡遺 닚솚 紐⑤뱢쓣 떒뼱 엫踰좊뵫 紐⑤뱢濡 援먯껜븯뿬 옱援ъ텞 븯떎. 븰뒿 떆媛꾩쓽 쉷湲곗쟻씤 媛먯냼瑜 엯利앺븯湲 쐞빐 援媛븫젙蹂댁꽱꽣쓽 븫 삁諛 諛 떎泥쒖뿉 븳 寃利앸맂 臾몄꽌뱾뿉꽌 異붿텧븳 28,844媛 紐낆젣瑜 쐞궎뵾뵒븘 븫 愿젴 臾몄꽌뿉꽌 異붿텧븳 7,844媛 紐낆젣뿉 二쇱꽍븯뒗 궗濡瑜 넻븯뿬 湲곗〈쓽 떆뒪뀥怨 옱援ъ텞븳 떆뒪뀥쓣 蹂묐젹 鍮꾧탳븯떎. 룞씪븳 솚寃쎌뿉꽌 湲곗〈 떆뒪뀥쓽 븰뒿 떆媛꾩씠 553.8씪濡 異 젙맂 寃껋뿉 鍮꾪빐 옱援ъ텞븳 떆뒪뀥 93.1遺 궡濡 븰뒿씠 셿猷뚮릺뿀떎. 蹂 뿰援ъ쓽 옣젏 씤怨 떊寃 젙由 利 紐낃퀎媛 紐⑤뱢솕 媛뒫븳 鍮꾩꽑삎 떆뒪뀥씠湲곗뿉 떎瑜 꽑삎 끉由 諛 옄뿰뼵뼱 泥섎━ 紐⑤뱢뱾怨 蹂묐젹쟻쑝濡 寃고빀 맆 닔 엳쓬뿉룄 쁽떎 궗濡뿉 씠瑜 쟻슜 遺덇뒫븯寃 뻽뜕 븰뒿 떆媛꾩뿉 븳 臾몄젣瑜 빐냼뻽떎뒗 젏씠떎.

Enhanced sign language transcription system via hand tracking and pose estimation

Jung-Ho Kim, Najoung Kim, Hancheol Park, and Jong C. Park
Journal of Computing Science and Engineering, Vol. 10, No. 3, pp. 95-101, September, 2016.
Show abstract
In this study, we propose a new system for constructing parallel corpora for sign languages, which are generally underresourced in comparison to spoken languages. In order to achieve scalability and accessibility regarding data collection and corpus construction, our system utilizes deep learning-based techniques and predicts depth information to perform pose estimation on hand information obtainable from video recordings by a single RGB camera. These estimated poses are then transcribed into expressions in SignWriting. We evaluate the accuracy of hand tracking and hand pose estimation modules of our system quantitatively, using the American Sign Language Image Dataset and the American Sign Language Lexicon Video Dataset. The evaluation results show that our transcription system has a high potential to be successfully employed in constructing a sizable sign language corpus using various types of video resources.

Making adjustments to event annotations for improved biological event extraction

Seung-Cheol Baek and Jong C. Park
Journal of Biomedical Semantics, 7:55, doi: 10.1186/s13326-016-0094-9, 16 September 2016. (SCIE IF 1.62)
Show abstract
Current state-of-the-art approaches to biological event extraction train statistical models in a supervised manner on corpora annotated with event triggers and event-argument relations. Inspecting such corpora, we observe that there is ambiguity in the span of event triggers (e.g., 쐔ranscriptional activity vs. 쁳ranscriptional), leading to inconsistencies across event trigger annotations. Such inconsistencies make it quite likely that similar phrases are annotated with different spans of event triggers, suggesting the possibility that a statistical learning algorithm misses an opportunity for generalizing from such event triggers.

We anticipate that adjustments to the span of event triggers to reduce these inconsistencies would meaningfully improve the present performance of event extraction systems. In this study, we look into this possibility with the corpora provided by the 2009 BioNLP shared task as a proof of concept. We propose an Informed Expectation-Maximization (EM) algorithm, which trains models using the EM algorithm with a posterior regularization technique, which consults the gold-standard event trigger annotations in a form of constraints. We further propose four constraints on the possible event trigger annotations to be explored by the EM algorithm.

The algorithm is shown to outperform the state-of-the-art algorithm on the development corpus in a statistically significant manner and on the test corpus by a narrow margin.

The analysis of the annotations generated by the algorithm shows that there are various types of ambiguity in event annotations, even though they could be small in number.

Prosodic and Linguistic Analysis of Semantic Fluency Data: A Window into Speech Production and Cognition

Maria Wolters, Najoung Kim, Jung-Ho Kim, Sarah E. MacPherson, and Jong C. Park
Interspeech 2016, pp. 2085-2089, San Francisco, California, September 8-12, 2016.
Show abstract
Semantic fluency is a commonly used task in psychology that provides data about executive function and semantic memory. Performance on the task is affected by conditions ranging from depression to dementia. The task involves participants naming as many members of a given category (e.g. animals) as possible in sixty seconds. Most of the analyses reported in the literature only rely on word counts and transcribed data, and do not take into account the evidence of utterance planning present in the speech signal. Using data from Korean, we show how prosodic analyses can be combined with computational linguistic analyses of the words produced to provide further insights into the processes involved in producing fluency data. We compare our analyses to an established analysis method for semantic fluency data, manual determination of lexically coherent clusters of words.

Computational Identification of Sequence Variation and Environmental Condition in Clinical Depression from Biomedical Literature

Jinseon You
MS Thesis, KAIST, 2016.
Show abstract
Clinical depression is a complex disease, which is known to be influenced by various factors. As genetic and environmental factors are frequently referred to as the most influential in causing depression, there have been many studies that try to identify genes or proteins and environmental conditions associated with depression. While a number of text-mining (TM) systems identifying information about the genetic factors in the biomedical literature have consequently been developed, there is currently no TM system specifically targeted at extracting environmental conditions. As a result, biologists are provided only with incomplete information about depression by these TM systems, unable to help them to discover the etiology and treatment of depression. In the thesis, we propose a TM system that considers an interaction between genetic and environmental factors associated with depression. The system identifies not only relations between a sequence variation and depression but also changes in the relations according to environmental conditions. In order to develop the system, we split the system into two TM subsystems. The first system is applied to an existing system for extracting the relations between a sequence variation and depression from the biomedical literature. The system classifies whether the relations are positive or negative on a document level. Based on the dictionary with candidate terms for environmental conditions, the second system identifies the conditions in the biomedical literature containing the binary relations. Using the dependency of sentence, the system excludes terms wrongly classified as the conditions. The system is a first TM system considering a ternary relation among sequence variation, disease and condition. Through the system, we are able to provide more comprehensive information about depression than other systems. We expect that, as the system is applied to other diseases, biologists can easily identify diverse information associated with changes in symptoms of diseases including depression.

Classification of Relations between Biological Entities using Word Vectors

Jimin Park, Jin-Woo Chung, and Jong C. Park
Proceedings of Korea Computer Congress (KCC), pp. 771-773, Jeju, Korea, June 29 - July 1, 2016. (poster presentation)
Show abstract
깮臾쇳븰쟻 泥닿퀎 븞뿉꽌 援ъ꽦 슂냼 媛꾩쓽 愿怨꾨 끉臾 뀓뒪듃瑜 넻빐 떇蹂꾪븯뒗 諛⑸쾿怨, 씪諛섏쟻씤 떒뼱 궗씠쓽 愿怨꾨 遺꾪룷 쓽誘 紐⑤뜽쓣 씠슜븯뿬 遺꾨쪟븯뒗 諛⑸쾿뿉 빐꽌뒗 留롮 뿰援ш 媛곴컖 엳뿀쑝굹, 몢 諛⑸쾿쓣 寃고빀븳 떆룄뒗 嫄곗쓽 蹂닿퀬릺吏 븡븯떎. 蹂 뿰援ъ뿉꽌뒗 遺꾪룷 紐⑤뜽씠 깮臾쇳븰쟻씤 泥닿퀎 븞뿉꽌 몢 援ъ꽦슂냼媛 留브퀬 엳뒗 愿怨꾨 삁痢≫븯뒗 뜲 뼱뼡 湲곗뿬瑜 븯뒗吏 븣븘蹂댁븯떎. 떎뿕 寃곌낵, 遺꾪룷 紐⑤뜽씠 깮臾쇳븰쟻 援ъ꽦 슂냼 媛꾩쓽 愿怨 떇蹂꾩뿉 쑀슜븳 옄吏덈줈 솢슜맆 닔 엳쓣 솗씤븯떎.

Addressing Low-Resource Problems in Statistical Machine Translation of Sign Language

Hancheol Park, Jung-Ho Kim, and Jong C. Park
Proceedings of Korea Computer Congress (KCC), pp. 714-716, Jeju, Korea, June 29 - July 1, 2016.
(selected as best paper)
Show abstract
理쒓렐 넻怨꾩쟻 湲곌퀎 踰덉뿭 湲곕쾿쓣 씠슜븳 닔솕 踰덉뿭 뿰援ш 솢諛쒗빐吏먯뿉룄 遺덇뎄븯怨, 蹂묐젹 留먮춬移 옄썝쓽 씗냼꽦 臾몄젣뒗 븘吏 빐寃곕릺吏 紐삵븯怨 엳떎. 蹂 뿰援щ뒗 넻怨꾩쟻 湲곌퀎 踰덉뿭 諛⑸쾿쓣 씠슜븯뿬 援ъ뼱濡 몴쁽 맆 닔 엳뒗 뼵뼱瑜 닔吏 몴쁽쑝濡 씠猷⑥뼱吏 닔솕濡 踰덉뿭 븷 븣, 옄썝 씗냼꽦뿉 湲곗씤븯뒗 臾몄젣젏뱾쓣 빐寃고븷 닔 엳뒗 꽭 媛吏 쟾泥섎━ 諛⑸쾿쓣 젣떆븳떎. 寃곌낵쟻쑝濡 옄썝 씗냼꽦 臾몄젣瑜 븞怨 엳뒗 닔솕 踰덉뿭뿉꽌 떎젣濡 踰덉뿭 꽦뒫쓣 뼢긽떆궗 닔 엳뒗 諛⑸쾿뱾씠 臾댁뾿씤吏瑜 떎뿕쓣 넻빐 솗씤븳떎. 蹂 뿰援ъ뿉꽌 젣븞븯뒗 쟾泥섎━ 諛⑸쾿 援ъ뼱 臾몄옣쓽 뙣윭봽젅씠吏뺤쓣 넻븳 留먮춬移 솗옣 諛⑸쾿, 援ъ뼱 떒뼱쓽 몴젣뼱솕瑜 넻븳 媛쒕퀎 뼱쐶 鍮덈룄瑜 넂씠뒗 諛⑸쾿, 洹몃━怨 닔吏 젙蹂대줈 몴쁽릺吏 븡뒗 援ъ뼱 뭹궗뿉 빐떦븯뒗 떒뼱瑜 젣嫄고븿쑝濡쒖뜥 援ъ뼱 닔솕 媛 臾몄옣 꽦遺꾩쓣 씪移섏떆궎뒗 諛⑸쾿씠떎. 쁺뼱 誘멸뎅 닔솕 蹂묐젹 留먮춬移섎 씠슜븳 떎뿕쓣 넻븯뿬 꽭 媛吏 쟾泥섎━ 諛⑸쾿 以 뙣윭봽젅씠利 깮꽦 諛 몴젣뼱솕쓽 쟻슜 떆뿉留 踰덉뿭 뭹吏덉씠 뼢긽맂떎뒗 궗떎쓣 븣 닔 엳뿀떎. 듅엳, 몢 諛⑸쾿씠 媛숈씠 쟻슜맆 븣 媛옣 넂 꽦뒫쓣 蹂댁떎.

Language processing through the collaboration with field experts

Jong C. Park
Invited talk, 2016 Spring conference of the Language Research Institute of Hankuk University of Foreign Studies, May 27, 2016.