Recent Publications

Publications The latest 10 papers published or under review

Extraction of Gene-Environment Interaction from the Biomedical Literature

Jinseon You, Jin-Woo Chung, Wonsuk Yang, and Jong C. Park
Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP 2017), pp. 865874, Taipei, Taiwan, November 27밆ecember 1, 2017.
Show abstract
Genetic information in the literature has been extensively looked into for the purpose of discovering the etiology of a disease. As the gene-disease relation is sensitive to external factors, their identification is important to study a disease. Environmental influences, which are usually called Gene-Environment interaction (GxE), have been considered as important factors and have extensively been researched in biology. Nevertheless, there is still a lack of systems for automatic GxE extraction from the biomedical literature due to new challenges: (1) there are no preprocessing tools and corpora for GxE, (2) expressions of GxE are often quite implicit, and (3) document-level comprehension is usually required. We propose to overcome these challenges with neural network models and show that a modified sequence-to-sequence model with a static RNN decoder produces a good performance in GxE recognition.

Inferring Implicit Event Locations from Context with Distributional Similarities

Jin-Woo Chung, Wonsuk Yang, Jinseon You, and Jong C. Park
Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17), pp. 979-985, Melbourne, Australia, August 19-25, 2017.
Show abstract
Automatic event location extraction from text plays a crucial role in many applications such as infectious disease surveillance and natural disaster monitoring. The fundamental limitation of previous work such as SpaceEval is the limited scope of extraction, targeting only at locations that are explicitly stated in a syntactic structure. This leads to missing a lot of implicit information inferable from context in a document, which amounts to nearly 40% of the entire location information. To overcome this limitation for the first time, we present a system that infers the implicit event locations from a given document. Our system exploits distributional semantics, based on the hypothesis that if two events are described by similar expressions, it is likely that they occur in the same location. For example, if 쏛 bomb exploded causing 30 victims and 쐌any people died from terrorist attack in Boston are reported in the same document, it is highly likely that the bomb exploded in Boston. Our system shows good performance of a 0.58 F1-score, where state-of-the-art classifiers for intra-sentential spatiotemporal relations achieve around 0.60 F1-scores.

Neural Theorem Prover with Word Embedding for Efficient Automatic Annotation

Wonsuk Yang, Hancheol Park, and Jong C. Park
Journal of KIISE, Vol. 44, No. 4, pp. 399-410, April, 2017.

Addressing low-resource problems in statistical machine translation of manual signals in sign language

Hancheol Park, Jung-Ho Kim, and Jong C. Park
Journal of KIISE, Vol. 44, No. 2, pp. 163-170, February, 2017.

Neural Theorem Prover with Word Embedding for Efficient Automatic Annotation

Wonsuk Yang, Hancheol Park, and Jong C. Park
Proceedings of the 28th Annual Conference on Human and Cognitive Language Technology (HCLT) pp. 79-84, Busan, Korea, October 07-08, 2016.
(selected as best paper)
Show abstract
蹂 뿰援щ뒗 쟾臾멸린愿뿉꽌 깮궛릺뒗 寃利앸맂 臾몄꽌瑜 쎒긽쓽 닔留롮 寃利앸릺吏 븡 臾몄꽌뿉 옄룞 二쇱꽍븯뿬 떊 猶곕룄 뼢긽 諛 떖솕 젙蹂대 옄룞쑝濡 異붽븯뒗 떆뒪뀥쓣 꽕怨꾪븯뒗 寃껋쓣 紐⑺몴濡 븳떎. 씠瑜 쐞빐 솢슜 媛뒫 븳 떆뒪뀥씤 씤怨 떊寃 젙由 利앸챸怨(neural theorem prover)媛 洹쒕え 留먮춬移섏뿉 쟻슜릺吏 븡뒗떎뒗 洹쇰낯 쟻씤 臾몄젣瑜 빐寃고븯湲 쐞빐 궡遺 닚솚 紐⑤뱢쓣 떒뼱 엫踰좊뵫 紐⑤뱢濡 援먯껜븯뿬 옱援ъ텞 븯떎. 븰뒿 떆媛꾩쓽 쉷湲곗쟻씤 媛먯냼瑜 엯利앺븯湲 쐞빐 援媛븫젙蹂댁꽱꽣쓽 븫 삁諛 諛 떎泥쒖뿉 븳 寃利앸맂 臾몄꽌뱾뿉꽌 異붿텧븳 28,844媛 紐낆젣瑜 쐞궎뵾뵒븘 븫 愿젴 臾몄꽌뿉꽌 異붿텧븳 7,844媛 紐낆젣뿉 二쇱꽍븯뒗 궗濡瑜 넻븯뿬 湲곗〈쓽 떆뒪뀥怨 옱援ъ텞븳 떆뒪뀥쓣 蹂묐젹 鍮꾧탳븯떎. 룞씪븳 솚寃쎌뿉꽌 湲곗〈 떆뒪뀥쓽 븰뒿 떆媛꾩씠 553.8씪濡 異 젙맂 寃껋뿉 鍮꾪빐 옱援ъ텞븳 떆뒪뀥 93.1遺 궡濡 븰뒿씠 셿猷뚮릺뿀떎. 蹂 뿰援ъ쓽 옣젏 씤怨 떊寃 젙由 利 紐낃퀎媛 紐⑤뱢솕 媛뒫븳 鍮꾩꽑삎 떆뒪뀥씠湲곗뿉 떎瑜 꽑삎 끉由 諛 옄뿰뼵뼱 泥섎━ 紐⑤뱢뱾怨 蹂묐젹쟻쑝濡 寃고빀 맆 닔 엳쓬뿉룄 쁽떎 궗濡뿉 씠瑜 쟻슜 遺덇뒫븯寃 뻽뜕 븰뒿 떆媛꾩뿉 븳 臾몄젣瑜 빐냼뻽떎뒗 젏씠떎.

Enhanced sign language transcription system via hand tracking and pose estimation

Jung-Ho Kim, Najoung Kim, Hancheol Park, and Jong C. Park
Journal of Computing Science and Engineering, Vol. 10, No. 3, pp. 95-101, September, 2016.
Show abstract
In this study, we propose a new system for constructing parallel corpora for sign languages, which are generally underresourced in comparison to spoken languages. In order to achieve scalability and accessibility regarding data collection and corpus construction, our system utilizes deep learning-based techniques and predicts depth information to perform pose estimation on hand information obtainable from video recordings by a single RGB camera. These estimated poses are then transcribed into expressions in SignWriting. We evaluate the accuracy of hand tracking and hand pose estimation modules of our system quantitatively, using the American Sign Language Image Dataset and the American Sign Language Lexicon Video Dataset. The evaluation results show that our transcription system has a high potential to be successfully employed in constructing a sizable sign language corpus using various types of video resources.

Making adjustments to event annotations for improved biological event extraction

Seung-Cheol Baek and Jong C. Park
Journal of Biomedical Semantics, 7:55, doi: 10.1186/s13326-016-0094-9, 16 September 2016. (SCIE IF 1.62)
Show abstract
Current state-of-the-art approaches to biological event extraction train statistical models in a supervised manner on corpora annotated with event triggers and event-argument relations. Inspecting such corpora, we observe that there is ambiguity in the span of event triggers (e.g., 쐔ranscriptional activity vs. 쁳ranscriptional), leading to inconsistencies across event trigger annotations. Such inconsistencies make it quite likely that similar phrases are annotated with different spans of event triggers, suggesting the possibility that a statistical learning algorithm misses an opportunity for generalizing from such event triggers.

We anticipate that adjustments to the span of event triggers to reduce these inconsistencies would meaningfully improve the present performance of event extraction systems. In this study, we look into this possibility with the corpora provided by the 2009 BioNLP shared task as a proof of concept. We propose an Informed Expectation-Maximization (EM) algorithm, which trains models using the EM algorithm with a posterior regularization technique, which consults the gold-standard event trigger annotations in a form of constraints. We further propose four constraints on the possible event trigger annotations to be explored by the EM algorithm.

The algorithm is shown to outperform the state-of-the-art algorithm on the development corpus in a statistically significant manner and on the test corpus by a narrow margin.

The analysis of the annotations generated by the algorithm shows that there are various types of ambiguity in event annotations, even though they could be small in number.

Prosodic and Linguistic Analysis of Semantic Fluency Data: A Window into Speech Production and Cognition

Maria Wolters, Najoung Kim, Jung-Ho Kim, Sarah E. MacPherson, and Jong C. Park
Interspeech 2016, pp. 2085-2089, San Francisco, California, September 8-12, 2016.
Show abstract
Semantic fluency is a commonly used task in psychology that provides data about executive function and semantic memory. Performance on the task is affected by conditions ranging from depression to dementia. The task involves participants naming as many members of a given category (e.g. animals) as possible in sixty seconds. Most of the analyses reported in the literature only rely on word counts and transcribed data, and do not take into account the evidence of utterance planning present in the speech signal. Using data from Korean, we show how prosodic analyses can be combined with computational linguistic analyses of the words produced to provide further insights into the processes involved in producing fluency data. We compare our analyses to an established analysis method for semantic fluency data, manual determination of lexically coherent clusters of words.

Computational Identification of Sequence Variation and Environmental Condition in Clinical Depression from Biomedical Literature

Jinseon You
MS Thesis, KAIST, 2016.
Show abstract
Clinical depression is a complex disease, which is known to be influenced by various factors. As genetic and environmental factors are frequently referred to as the most influential in causing depression, there have been many studies that try to identify genes or proteins and environmental conditions associated with depression. While a number of text-mining (TM) systems identifying information about the genetic factors in the biomedical literature have consequently been developed, there is currently no TM system specifically targeted at extracting environmental conditions. As a result, biologists are provided only with incomplete information about depression by these TM systems, unable to help them to discover the etiology and treatment of depression. In the thesis, we propose a TM system that considers an interaction between genetic and environmental factors associated with depression. The system identifies not only relations between a sequence variation and depression but also changes in the relations according to environmental conditions. In order to develop the system, we split the system into two TM subsystems. The first system is applied to an existing system for extracting the relations between a sequence variation and depression from the biomedical literature. The system classifies whether the relations are positive or negative on a document level. Based on the dictionary with candidate terms for environmental conditions, the second system identifies the conditions in the biomedical literature containing the binary relations. Using the dependency of sentence, the system excludes terms wrongly classified as the conditions. The system is a first TM system considering a ternary relation among sequence variation, disease and condition. Through the system, we are able to provide more comprehensive information about depression than other systems. We expect that, as the system is applied to other diseases, biologists can easily identify diverse information associated with changes in symptoms of diseases including depression.

Classification of Relations between Biological Entities using Word Vectors

Jimin Park, Jin-Woo Chung, and Jong C. Park
Proceedings of Korea Computer Congress (KCC), pp. 771-773, Jeju, Korea, June 29 - July 1, 2016. (poster presentation)
Show abstract
깮臾쇳븰쟻 泥닿퀎 븞뿉꽌 援ъ꽦 슂냼 媛꾩쓽 愿怨꾨 끉臾 뀓뒪듃瑜 넻빐 떇蹂꾪븯뒗 諛⑸쾿怨, 씪諛섏쟻씤 떒뼱 궗씠쓽 愿怨꾨 遺꾪룷 쓽誘 紐⑤뜽쓣 씠슜븯뿬 遺꾨쪟븯뒗 諛⑸쾿뿉 빐꽌뒗 留롮 뿰援ш 媛곴컖 엳뿀쑝굹, 몢 諛⑸쾿쓣 寃고빀븳 떆룄뒗 嫄곗쓽 蹂닿퀬릺吏 븡븯떎. 蹂 뿰援ъ뿉꽌뒗 遺꾪룷 紐⑤뜽씠 깮臾쇳븰쟻씤 泥닿퀎 븞뿉꽌 몢 援ъ꽦슂냼媛 留브퀬 엳뒗 愿怨꾨 삁痢≫븯뒗 뜲 뼱뼡 湲곗뿬瑜 븯뒗吏 븣븘蹂댁븯떎. 떎뿕 寃곌낵, 遺꾪룷 紐⑤뜽씠 깮臾쇳븰쟻 援ъ꽦 슂냼 媛꾩쓽 愿怨 떇蹂꾩뿉 쑀슜븳 옄吏덈줈 솢슜맆 닔 엳쓣 솗씤븯떎.