Recent Publications

Publications The latest 10 papers published or under review

Computer Assisted Annotation of Tension Development in TED Talks through Crowdsourcing

Seungwon Yoon, Wonsuk Yang, and Jong C. Park
1st Workshop on Aggregating and analysing crowdsourced annotations for NLP (AnnoNLP), Hong Kong SAR, November 2, 2019.

Generating Sentential Arguments from Diverse Perspectives on Controversial Topic

ChaeHun Park, Wonsuk Yang, and Jong C. Park
2nd Workshop on NLP for Internet Freedom (NLP4IF): Censorship, Disinformation, and Propaganda, Hong Kong SAR, November 3, 2019.

Nonsense!: Quality Control via Two-Step Reason Selection for Annotating Local Acceptability and Related Attributes in News Editorials

Wonsuk Yang, Seungwon Yoon, Ada Carpenter, and Jong C. Park
2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong SAR, November 3-7, 2019.

Assessing the multi-level knowledge prominence perceived by the authors as revealed on their writings

Wonsuk Yang, Jin-Woo Chung, and Jong C. Park
Language and Information, Vol. 23, No. 2, 2019.

A Corpus of Sentence-level Annotations of Local Acceptability with Reasons

Wonsuk Yang, Jung-Ho Kim, Seungwon Yoon, ChaeHun Park, and Jong C. Park
33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33), Hakodate, Japan, September 13-15, 2019.

Automatic Scoring of Semantic Fluency

Najoung Kim, Jung-Ho Kim, Maria K. Wolters, Sarah E. MacPherson, and Jong C. Park
Frontiers in Psychology, Vol. 10, pp. 1020, 2019. (SSCI IF 2.089)
Show abstract
In neuropsychological assessment, semantic fluency is a widely accepted measure of executive function and access to semantic memory. While fluency scores are typically reported as the number of unique words produced, several alternative manual scoring methods have been proposed that provide additional insights into performance, such as clusters of semantically related items. Many automatic scoring methods yield metrics that are difficult to relate to the theories behind manual scoring methods, and most require manually-curated linguistic ontologies or large corpus infrastructure. In this paper, we propose a novel automatic scoring method based on Wikipedia, Backlink-VSM, which is easily adaptable to any of the 61 languages with more than 100k Wikipedia entries, can account for cultural differences in semantic relatedness, and covers a wide range of item categories. Our Backlink-VSM method combines relational knowledge as represented by links between Wikipedia entries (Backlink model) with a semantic proximity metric derived from distributional representations (vector space model; VSM). Backlink-VSM yields measures that approximate manual clustering and switching analyses, providing a straightforward link to the substantial literature that uses these metrics. We illustrate our approach with examples from two languages (English and Korean), and two commonly used categories of items (animals and fruits). For both Korean and English, we show that the measures generated by our automatic scoring procedure correlate well with manual annotations. We also successfully replicate findings that older adults produce significantly fewer switches compared to younger adults. Furthermore, our automatic scoring procedure outperforms the manual scoring method and a WordNet-based model in separating younger and older participants measured by binary classification accuracy for both English and Korean datasets. Our method also generalizes to a different category (fruit), demonstrating its adaptability.

A Corpus of Sentential Annotations on News Editorials with Multi-dimensional Credibility Metrics

Wonsuk Yang, Jung-Ho Kim, Jin-Woo Chung, and Jong C. Park
Human-Computer Interaction Korea (HCI), Jeju ICC, Korea, February 13-15, 2019.

Mitigating Stereotypes in Word Embedding through Sentiment Modulation

Huije Lee, Jin-Woo Chung, and Jong C. Park
Korea Software Congress (KSC), Pyeongchang, Korea, December 19-21, 2018.
Show abstract
떒뼱 엫踰좊뵫 李⑥썝 踰≫꽣 궡뿉 떒뼱쓽 쓽誘몄쟻 젙蹂대 슚怨쇱쟻쑝濡 떞뒗 紐⑤뜽濡, 떒뼱쓽 쓽誘몄쟻 젙蹂대 궗슜븯뒗 뿬윭 옄뿰뼵뼱泥섎━ 遺꾩빞뿉꽌 誘몃━ 븰뒿맂 word2vec씠 궗슜릺怨 엳떎. 洹몃윭굹 웾쓽 뀓뒪듃濡 븰 뒿맂 떒뼱 엫踰좊뵫 紐⑤뜽 궗엺씠 媛吏 닔 엳뒗 꽦, 씤醫 벑뿉 븳 怨좎젙愿뀗 삉븳 쓽誘 젙蹂대줈 븰뒿븳떎뒗 臾몄젣젏씠 엳떎. 蹂 끉臾몄뿉꽌뒗 씤臾 샊 떒泥대 吏移븯뒗 떒뼱뿉 븳 븫떆쟻씤 媛먯꽦씠 紐⑤뜽쓣 렪뼢떆궗 닔 엳떎뒗 젏뿉 二쇰ぉ븯뿬, 엫踰좊뵫 紐⑤뜽 궡뿉꽌 젙꽌쟻 怨좎젙愿뀗쓣 뱶윭궡뒗 떒뼱瑜 깘吏븯뒗 諛⑸쾿쓣 젣떆븯怨 怨좎젙愿뀗 셿솕瑜 쐞빐 씤臾 媛쒖껜뿉 븳 媛먯꽦 李⑥썝씠 議곗젙맂 엫踰좊뵫 紐⑤뜽쓣 젣븞븳떎. 떎뿕 寃곌낵, 씤臾 媛쒖껜뿉 븳 媛먯꽦 李⑥썝쓽 엫踰좊뵫씠 利앷컯맆닔濡 紐⑤뜽쓽 렪뼢꽦씠 떖솕릺뿀쑝硫, 젣븞븯뒗 紐⑤뜽 湲곗〈 紐⑤뜽 뿉 鍮꾪빐 16%쓽 렪뼢꽦씠 媛먯냼릺뿀吏留 꽦뒫 蹂솕 룺 1% 씠궡濡 쑀吏릺뒗 寃껋쓣 솗씤븯떎.

Neural Grammatical Error Correction by Simulating the Human Learner and the Human Proofreader

Fitsum Gaim, Jin-Woo Chung, and Jong C. Park
Korea Software Congress (KSC), Pyeongchang, Korea, December 19-21, 2018.
Show abstract
We present a learning framework for grammatical error correction (GEC) that leverages the duality of translation to effectively synthesize training signals from a monolingual corpus through a game of two contrasting agents that are initially trained with a small amount of parallel data. The first agent learns to produce ostensibly natural errors, whereas the second learns to proofread the erroneous output into grammatically correct text. This approach not only alleviates the need for large parallel corpora but also exposes the GEC model to a wider range of error types. Our final model is competitive against the best systems, outperforming some of the strongest models on standard benchmarks.

Interpretable Depression Detection from Social Media using Hierarchical Attention Network with Depression Indicators

Hoyun Song
MS Thesis, KAIST, 2018.
Show abstract
In order to effectively diagnose depression, which is one of the most harmful mental disorders, many researchers used social media by analyzing the differences in language use. However, detecting depression from social media has problems such as a small proportion of posts with depression indicators and difficulties for distinguishing depressive symptoms from temporarily depressed feelings. To address these problems, we propose hierarchical attention with depressive indicators inspired by the process of diagnosing depression by a person with domain knowledge. Our model provides not only interpretations, but also their visualizations with learned weights through attention mechanism. With this model, we can investigate different aspects of posts with depressive indicators based on psychological theories, which will help researchers to find useful evidence for depressive characteristics.