Recent Publications
Publications The latest 10 papers published or under reviewComputer Assisted Annotation of Tension Development in TED Talks through Crowdsourcing
Seungwon Yoon, Wonsuk Yang, and Jong C. Park
1st Workshop on Aggregating and analysing crowdsourced annotations for NLP (AnnoNLP), Hong Kong SAR, November 2, 2019.
1st Workshop on Aggregating and analysing crowdsourced annotations for NLP (AnnoNLP), Hong Kong SAR, November 2, 2019.
Generating Sentential Arguments from Diverse Perspectives on Controversial Topic
ChaeHun Park, Wonsuk Yang, and Jong C. Park
2nd Workshop on NLP for Internet Freedom (NLP4IF): Censorship, Disinformation, and Propaganda, Hong Kong SAR, November 3, 2019.
2nd Workshop on NLP for Internet Freedom (NLP4IF): Censorship, Disinformation, and Propaganda, Hong Kong SAR, November 3, 2019.
Nonsense!: Quality Control via Two-Step Reason Selection for Annotating Local Acceptability and Related Attributes in News Editorials
Wonsuk Yang, Seungwon Yoon, Ada Carpenter, and Jong C. Park
2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong SAR, November 3-7, 2019.
2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong SAR, November 3-7, 2019.
Assessing the multi-level knowledge prominence perceived by the authors as revealed on their writings
Wonsuk Yang, Jin-Woo Chung, and Jong C. Park
Language and Information, Vol. 23, No. 2, 2019.
Language and Information, Vol. 23, No. 2, 2019.
A Corpus of Sentence-level Annotations of Local Acceptability with Reasons
Wonsuk Yang, Jung-Ho Kim, Seungwon Yoon, ChaeHun Park, and Jong C. Park
33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33), Hakodate, Japan, September 13-15, 2019.
33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33), Hakodate, Japan, September 13-15, 2019.
Automatic Scoring of Semantic Fluency
Najoung Kim, Jung-Ho Kim, Maria K. Wolters, Sarah E. MacPherson, and Jong C. Park
Frontiers in Psychology, Vol. 10, pp. 1020, 2019. (SSCI IF 2.089)
Show abstract
Frontiers in Psychology, Vol. 10, pp. 1020, 2019. (SSCI IF 2.089)
Show abstract

In neuropsychological assessment, semantic fluency is a widely accepted measure of executive function and access to semantic memory. While fluency scores are typically reported as the number of unique words produced, several alternative manual scoring methods have been proposed that provide additional insights into performance, such as clusters of semantically related items. Many automatic scoring methods yield metrics that are difficult to relate to the theories behind manual scoring methods, and most require manually-curated linguistic ontologies or large corpus infrastructure. In this paper, we propose a novel automatic scoring method based on Wikipedia, Backlink-VSM, which is easily adaptable to any of the 61 languages with more than 100k Wikipedia entries, can account for cultural differences in semantic relatedness, and covers a wide range of item categories. Our Backlink-VSM method combines relational knowledge as represented by links between Wikipedia entries (Backlink model) with a semantic proximity metric derived from distributional representations (vector space model; VSM). Backlink-VSM yields measures that approximate manual clustering and switching analyses, providing a straightforward link to the substantial literature that uses these metrics. We illustrate our approach with examples from two languages (English and Korean), and two commonly used categories of items (animals and fruits). For both Korean and English, we show that the measures generated by our automatic scoring procedure correlate well with manual annotations. We also successfully replicate findings that older adults produce significantly fewer switches compared to younger adults. Furthermore, our automatic scoring procedure outperforms the manual scoring method and a WordNet-based model in separating younger and older participants measured by binary classification accuracy for both English and Korean datasets. Our method also generalizes to a different category (fruit), demonstrating its adaptability.
A Corpus of Sentential Annotations on News Editorials with Multi-dimensional Credibility Metrics
Wonsuk Yang, Jung-Ho Kim, Jin-Woo Chung, and Jong C. Park
Human-Computer Interaction Korea (HCI), Jeju ICC, Korea, February 13-15, 2019.
Human-Computer Interaction Korea (HCI), Jeju ICC, Korea, February 13-15, 2019.
Mitigating Stereotypes in Word Embedding through Sentiment Modulation
Huije Lee, Jin-Woo Chung, and Jong C. Park
Korea Software Congress (KSC), Pyeongchang, Korea, December 19-21, 2018.
Show abstract
Korea Software Congress (KSC), Pyeongchang, Korea, December 19-21, 2018.
Show abstract
단어 임베딩은 저차원 벡터 내에 단어의 의미적 정보를 효과적으로 담는 모델로, 단어의 의미적 정보를 사용하는 여러 자연언어처리 분야에서 미리 학습된 word2vec이 사용되고 있다. 그러나 대량의 텍스트로 학 습된 단어 임베딩 모델은 사람이 가질 수 있는 성, 인종 등에 대한 고정관념 또한 의미 정보로 학습한다는 문제점이 있다. 본 논문에서는 인물 혹은 단체를 지칭하는 단어에 대한 암시적인 감성이 모델을 편향시킬 수 있다는 점에 주목하여, 임베딩 모델 내에서 정서적 고정관념을 드러내는 단어를 탐지하는 방법을 제시하고 고정관념 완화를 위해 인물 개체에 대한 감성 차원이 조정된 임베딩 모델을 제안한다. 실험 결과, 인물 개체에 대한 감성 차원의 임베딩이 증강될수록 모델의 편향성이 심화되었으며, 제안하는 모델은 기존 모델 에 비해 16%의 편향성이 감소되었지만 성능 변화 폭은 1% 이내로 유지되는 것을 확인하였다.
Neural Grammatical Error Correction by Simulating the Human Learner and the Human Proofreader
Fitsum Gaim, Jin-Woo Chung, and Jong C. Park
Korea Software Congress (KSC), Pyeongchang, Korea, December 19-21, 2018.
Show abstract
Korea Software Congress (KSC), Pyeongchang, Korea, December 19-21, 2018.
Show abstract
We present a learning framework for grammatical error correction (GEC) that leverages the duality of translation to effectively synthesize training signals from a monolingual corpus through a game of two contrasting agents that are initially trained with a small amount of parallel data. The first agent learns to produce ostensibly natural errors, whereas the second learns to proofread the erroneous output into grammatically correct text. This approach not only alleviates the need for large parallel corpora but also exposes the GEC model to a wider range of error types. Our final model is competitive against the best systems, outperforming some of the strongest models on standard benchmarks.
Interpretable Depression Detection from Social Media using Hierarchical Attention Network with Depression Indicators
Hoyun Song
MS Thesis, KAIST, 2018.
Show abstract
MS Thesis, KAIST, 2018.
Show abstract
In order to effectively diagnose depression, which is one of the most harmful mental disorders, many researchers used social media by analyzing the differences in language use. However, detecting depression from social media has problems such as a small proportion of posts with depression indicators and difficulties for distinguishing depressive symptoms from temporarily depressed feelings. To address these problems, we propose hierarchical attention with depressive indicators inspired by the process of diagnosing depression by a person with domain knowledge. Our model provides not only interpretations, but also their visualizations with learned weights through attention mechanism. With this model, we can investigate different aspects of posts with depressive indicators based on psychological theories, which will help researchers to find useful evidence for depressive characteristics.