Recent Publications

Publications The latest 10 papers published or under review

Automatic Scoring of Semantic Fluency

Najoung Kim, Jung-Ho Kim, Maria K. Wolters, Sarah E. MacPherson, and Jong C. Park
Frontiers in Psychology, Vol. 10, pp. 1020, 2019. (SSCI IF 2.089)
Show abstract
In neuropsychological assessment, semantic fluency is a widely accepted measure of executive function and access to semantic memory. While fluency scores are typically reported as the number of unique words produced, several alternative manual scoring methods have been proposed that provide additional insights into performance, such as clusters of semantically related items. Many automatic scoring methods yield metrics that are difficult to relate to the theories behind manual scoring methods, and most require manually-curated linguistic ontologies or large corpus infrastructure. In this paper, we propose a novel automatic scoring method based on Wikipedia, Backlink-VSM, which is easily adaptable to any of the 61 languages with more than 100k Wikipedia entries, can account for cultural differences in semantic relatedness, and covers a wide range of item categories. Our Backlink-VSM method combines relational knowledge as represented by links between Wikipedia entries (Backlink model) with a semantic proximity metric derived from distributional representations (vector space model; VSM). Backlink-VSM yields measures that approximate manual clustering and switching analyses, providing a straightforward link to the substantial literature that uses these metrics. We illustrate our approach with examples from two languages (English and Korean), and two commonly used categories of items (animals and fruits). For both Korean and English, we show that the measures generated by our automatic scoring procedure correlate well with manual annotations. We also successfully replicate findings that older adults produce significantly fewer switches compared to younger adults. Furthermore, our automatic scoring procedure outperforms the manual scoring method and a WordNet-based model in separating younger and older participants measured by binary classification accuracy for both English and Korean datasets. Our method also generalizes to a different category (fruit), demonstrating its adaptability.

A Corpus of Sentential Annotations on News Editorials with Multi-dimensional Credibility Metrics

Wonsuk Yang, Jung-Ho Kim, Jin-Woo Chung, and Jong C. Park
Human-Computer Interaction Korea (HCI), Jeju ICC, Korea, February 13-15, 2019.

Mitigating Stereotypes in Word Embedding through Sentiment Modulation

Huije Lee, Jin-Woo Chung, and Jong C. Park
Korea Software Congress (KSC), Pyeongchang, Korea, December 19-21, 2018.
Show abstract
떒뼱 엫踰좊뵫 李⑥썝 踰≫꽣 궡뿉 떒뼱쓽 쓽誘몄쟻 젙蹂대 슚怨쇱쟻쑝濡 떞뒗 紐⑤뜽濡, 떒뼱쓽 쓽誘몄쟻 젙蹂대 궗슜븯뒗 뿬윭 옄뿰뼵뼱泥섎━ 遺꾩빞뿉꽌 誘몃━ 븰뒿맂 word2vec씠 궗슜릺怨 엳떎. 洹몃윭굹 웾쓽 뀓뒪듃濡 븰 뒿맂 떒뼱 엫踰좊뵫 紐⑤뜽 궗엺씠 媛吏 닔 엳뒗 꽦, 씤醫 벑뿉 븳 怨좎젙愿뀗 삉븳 쓽誘 젙蹂대줈 븰뒿븳떎뒗 臾몄젣젏씠 엳떎. 蹂 끉臾몄뿉꽌뒗 씤臾 샊 떒泥대 吏移븯뒗 떒뼱뿉 븳 븫떆쟻씤 媛먯꽦씠 紐⑤뜽쓣 렪뼢떆궗 닔 엳떎뒗 젏뿉 二쇰ぉ븯뿬, 엫踰좊뵫 紐⑤뜽 궡뿉꽌 젙꽌쟻 怨좎젙愿뀗쓣 뱶윭궡뒗 떒뼱瑜 깘吏븯뒗 諛⑸쾿쓣 젣떆븯怨 怨좎젙愿뀗 셿솕瑜 쐞빐 씤臾 媛쒖껜뿉 븳 媛먯꽦 李⑥썝씠 議곗젙맂 엫踰좊뵫 紐⑤뜽쓣 젣븞븳떎. 떎뿕 寃곌낵, 씤臾 媛쒖껜뿉 븳 媛먯꽦 李⑥썝쓽 엫踰좊뵫씠 利앷컯맆닔濡 紐⑤뜽쓽 렪뼢꽦씠 떖솕릺뿀쑝硫, 젣븞븯뒗 紐⑤뜽 湲곗〈 紐⑤뜽 뿉 鍮꾪빐 16%쓽 렪뼢꽦씠 媛먯냼릺뿀吏留 꽦뒫 蹂솕 룺 1% 씠궡濡 쑀吏릺뒗 寃껋쓣 솗씤븯떎.

Neural Grammatical Error Correction by Simulating the Human Learner and the Human Proofreader

Fitsum Gaim, Jin-Woo Chung, and Jong C. Park
Korea Software Congress (KSC), Pyeongchang, Korea, December 19-21, 2018.
Show abstract
We present a learning framework for grammatical error correction (GEC) that leverages the duality of translation to effectively synthesize training signals from a monolingual corpus through a game of two contrasting agents that are initially trained with a small amount of parallel data. The first agent learns to produce ostensibly natural errors, whereas the second learns to proofread the erroneous output into grammatically correct text. This approach not only alleviates the need for large parallel corpora but also exposes the GEC model to a wider range of error types. Our final model is competitive against the best systems, outperforming some of the strongest models on standard benchmarks.

Interpretable Depression Detection from Social Media using Hierarchical Attention Network with Depression Indicators

Hoyun Song
MS Thesis, KAIST, 2018.
Show abstract
In order to effectively diagnose depression, which is one of the most harmful mental disorders, many researchers used social media by analyzing the differences in language use. However, detecting depression from social media has problems such as a small proportion of posts with depression indicators and difficulties for distinguishing depressive symptoms from temporarily depressed feelings. To address these problems, we propose hierarchical attention with depressive indicators inspired by the process of diagnosing depression by a person with domain knowledge. Our model provides not only interpretations, but also their visualizations with learned weights through attention mechanism. With this model, we can investigate different aspects of posts with depressive indicators based on psychological theories, which will help researchers to find useful evidence for depressive characteristics.

Mitigating Stereotypes in Word Embedding through Sentiment Modulation

Huije Lee
MS Thesis, KAIST, 2018.
Show abstract
Word embedding is an influential framework to quantify the meaning of a word, which is widely used in machine learning at a pre-processing level for natural language processing (NLP). However, word embedding trained with a large number of contexts encodes not only general syntactic and semantic meaning of a word, but also the stereotypes and biases that people may have. This thesis proposes a method to indirectly mitigate the stereotypes in the trained word embedding by modulating the dimension of sentimental attributes in a human entity without imposing equal probability on the compatible social groups. To prevent the word embedding from creating problematic predictions such as a stereotype threat, we modulate the strength of the association between a human entity and sentimental attribute and indirectly reduce the gender bias of the embedding model. We show that the proposed method preserves the overall embedding performance. We also confirm that increasing the strength of the association between human entities and sentimental attributes amplifies the model bias through experiment.

Feature Attention Network: Interpretable Depression Detection from Social Media

Hoyun Song, Jinseon You, Jin-Woo Chung, and Jong C. Park
32nd Pacific Asia Conference on Language, Information and Computation (PACLIC 32), The Hong Kong Polytechnic University, Hong Kong SAR, December 1-3, 2018.
Show abstract
Although depression is one of the most common mental disorders, the depressed individuals may not be aware of their symptoms at all so that they sometimes miss the appropriate time for treatment. In order to prevent this problem, many researchers looked into social media to figure out depressed individuals by analyzing the differences in language use. While they have recently achieved reasonable performance in detecting depression, especially using deep learning methods, such methods still do not provide a clear way to explain why certain individuals have been detected as depressed. To address this issue, we propose Feature Attention Network (FAN), inspired by the process of diagnosing depression by an expert who has background knowledge about depression. We evaluate the performance of our model on a large scale general forum (Reddit Self-reported Depression Diagnosis) dataset. Experimental results demonstrate that FAN shows good performance with high interpretability despite a smaller number of posts in training data. We investigate different aspects of posts by depressed users through four feature networks built upon psychological studies, which will help researchers to investigate social media posts to find useful evidence for depressive symptoms.

Extracting Supporting Evidence with High Precision via Bi-LSTM Network

ChaeHun Park, Wonsuk Yang, and Jong C. Park
30th Annual Conference on Human & Cognitive Language Technology, Korea University, Seoul, Korea, October 12-13, 2018.
Show abstract
끉吏媛 넂 꽕뱷젰쓣 媛뽮린 쐞빐꽌뒗 異⑸텇븳 吏吏 洹쇨굅媛 븘슂븯떎. 끉吏 궡쓽 二쇱옣쓣 끉由ъ쟻쑝濡 吏吏븷 닔 엳뒗 洹쇨굅 옄猷 異붿텧쓽 옄룞솕뒗 옄룞 넗濡 떆뒪뀥, 젙梨 닾몴뿉 븳 쓽궗 寃곗젙 蹂댁“ 벑 뿬윭 뼱뵆由ъ씠뀡쓽 媛쒕컻 諛 긽슜솕瑜 쐞빐 븘닔쟻쑝濡 빐寃곕릺뼱빞 븳떎. 븯吏留 쎒臾몄꽌濡쒕꽣 吏吏 洹쇨굅瑜 異붿텧븯뒗 떆뒪뀥쓣 쐞빐꽌뒗 떎쓬怨 媛숈 몢 媛吏 뿰援ш 꽑뻾릺뼱빞 븯怨, 씠뒗 넂 꽦뒫쓽 떆뒪뀥 援ы쁽쓣 뼱졄寃 븳떎: 1) 끉吏쓽 二쇱젣 吏곸젒쟻씤 愿젴꽦 궙吏留 吏吏 洹쇨굅濡 궗슜맆 닔 엳뒗 젙蹂대 솗蹂댄븯湲 쐞븳 꼻 寃깋 踰붿쐞, 2) 닔吏묓븳 젙蹂 궡뿉꽌 끉吏쓽 二쇱옣쓣 紐낇솗븯寃 吏吏븷 닔 엳뒗 洹쇨굅瑜 떇蹂꾪븷 닔 엳뒗 씤吏 뒫젰. 蹂 뿰援щ뒗 넂 젙諛룄 솗옣 媛뒫꽦쓣 媛吏 吏吏 洹쇨굅 異붿텧쓣 쐞빐 떎쓬怨 媛숈 떒怨꾩쟻 吏吏 洹쇨굅 異붿텧 떆뒪뀥쓣 젣븞븳떎: 1) TF-IDF 쑀궗룄 湲곕컲 愿젴 臾몄꽌 꽑蹂, 2) 쓽誘몄쟻 쑀궗룄瑜 넻븳 吏吏 洹쇨굅 1李 異붿텧, 3) 떊寃쎈쭩 遺꾨쪟湲곕 넻븳 吏吏 洹쇨굅 2李 異붿텧. 젣븞븯뒗 떆뒪뀥쓽 쑀슚꽦쓣 寃利앺븯湲 쐞빐 궗꽕 4008媛 궡쓽 二쇱옣뿉 빐 쎒 긽뿉 엳뒗 845675媛쒖쓽 돱뒪뿉꽌 吏吏 洹쇨굅瑜 異붿텧븯뒗 떎뿕쓣 닔뻾븯떎. 二쇱옣怨 吏吏 洹쇨굅瑜 二쇱꽍븳 젙蹂댁뿉 븯뿬 꽦뒫 룊媛瑜 吏꾪뻾븳 寃곌낵 蹂 뿰援ъ뿉꽌 젣븞븳 떒怨꾩쟻 떆뒪뀥 1,2李 異붿텧 怨쇱젙뿉꽌 媛곴컖 0.41, 0.70쓽 젙諛룄瑜 蹂댁떎. 씠썑 떆뒪뀥씠 異붿텧븳 吏吏 洹쇨굅瑜 遺꾩꽍븯뿬, 끉吏뿉 븳 쟻젅븳 씠빐瑜 諛뷀깢쑝濡 븳 吏吏 洹쇨굅 異붿텧씠 媛뒫븯떎뒗 寃껋쓣 솗씤븯떎.

Automatic Tension Recognition from Lecture Show Transcripts

Seungwon Yoon, Wonsuk Yang, and Jong C. Park
30th Annual Conference on Human & Cognitive Language Technology, Korea University, Seoul, Korea, October 12-13, 2018.
Show abstract
湲댁옣씠씪뒗 痢〓㈃ 쓽궗냼넻쓣 븯嫄곕굹 湲쓣 씫쓣 븣 궗엺뿉寃 빆긽 쁺뼢쓣 二쇨퀬 엳떎. 湲댁옣쓽 媛쒕뀗 옄뿰뼵뼱泥섎━ 遺꾩빞뿉꽌 愿묐쾾쐞븳 쓽誘몃줈 궗슜릺뼱 솕뒗뜲, 蹂 끉臾몄 씠윴 媛쒕뀗 以 媛뺤뿰怨 媛숈 븳 諛⑺뼢 솕뿉꽌 솕옄쓽 留먯뿉 븯뿬 泥以묒씠 媛吏뒗 湲댁옣룄뿉 吏묒쨷븯뿬 씠瑜 젙웾솕븯뒗 諛⑸쾿쓣 젣븞븳떎. 븳 紐낆쓽 옄뿉 쓽빐 꽌닠맂 臾몄꽌뿉 湲댁옣룄 媛쒕뀗쓣 쟻슜븿뿉 엳뼱, 븳 諛⑺뼢 솕뿉꽌쓽 湲댁옣룄瑜 젙웾솕븯뒗 蹂 뿰援щ뒗 湲댁옣룄 媛쒕뀗쓣 씪諛 臾몄꽌뿉 쟻슜븷 븣뿉 蹂대떎 슜씠븯寃 솢슜맆 寃껋쑝濡 삁긽븳떎. 蹂 뿰援ъ뿉꽌뒗 癒쇱 솕옄쓽 留먯뿉 븳 泥以묒쓽 湲댁옣룄媛 二쇱꽍릺뼱 엳뒗 깉濡쒖슫 留먮춬移섎 援ъ텞븯떎. 삉븳 臾몃㎘쓣 怨좊젮븯뿬 湲댁옣룄瑜 삁痢≫븷 닔 엳뒗 紐⑤뜽怨 씠뿉 뵲瑜 湲댁옣룄 遺꾨쪟 꽦뒫뿉 븳 떎뿕 寃곌낵瑜 넻븯뿬 옄룞 湲댁옣룄 遺꾨쪟媛 怨꾩궛쟻쑝濡 媛뒫븯떎뒗 寃껋쓣 蹂댁씤떎.

Extracting Spatial Information about Events from Text

Jin-Woo Chung
PhD Dissertation, KAIST, Feb. 2018