Publications
Publications MS ThesesEffective Pre-processing on Hand Keypoints for Sign Language Recognition
KyungGeun Roh
MS Thesis, KAIST, 2024
MS Thesis, KAIST, 2024
Augmentation of Sign Language Poses by Including the Understanding of the Sign Language Domain by Body Part
Aujin Kim
MS Thesis, KAIST, 2023.
MS Thesis, KAIST, 2023.
Knowledge Transfer for Enhanced Sentiment-Based Abusive Language Detection: Insights from Sarcasm Detection
Dongho Choi
MS Thesis, KAIST, 2023.
MS Thesis, KAIST, 2023.
Template-based Document Labeling for Dense Retrieval
Sukmin Cho
MS Thesis, KAIST, 2022.
MS Thesis, KAIST, 2022.
Data Augmentation for Abusive Language Detection via Back-translation and Domain Knowledge
Jisu Shin
MS Thesis, KAIST, 2022.
MS Thesis, KAIST, 2022.
Information Retrieval by Augmenting Document Representation
Soyeong Jeong
MS Thesis, KAIST, 2021
Show abstract
MS Thesis, KAIST, 2021
Show abstract
One of the challenges in information retrieval (IR) is the vocabularymismatch problem, which refers to the failure of retrieving the query-relevant document when the terms between the query and the document are lexically different but semantically similar. While recent work has tried to tackle the problem by expanding sparse representations with additional relevant terms or by embedding the representations to learnable dense space, both of the expansion and dense models generally require a large volume of labeled query-document pairs to train, whereas it is often challenging to acquire the labeled pairs annotated by humans. The thesis focuses on augmenting the document representations, either on the document text level or on the training dataset level, without requiring additional labeled query-document pairs for both sparse and dense retrieval models. For the sparse retrieval model, we propose Unsupervised Document Expansion with Generation (UDEG), which generates diverse supplementary sentences for the original document without using labels on query-document pairs for training. For generating sentences, we further stochastically perturb their embeddings to generate more diverse sentences for document expansion. We validate our UDEG on two standard IR benchmark datasets. The results show that our UDEG significantly outperforms relevant expansion baselines. For the dense retrieval model, we propose Document Augmentation for dense Retrieval (DAR), which augments the document representations with interpolation and perturbation. We validate the performance of DAR on retrieval tasks with two benchmark datasets, showing that the proposed DAR significantly outperforms relevant baselines on the dense retrieval of both the seen and unseen documents. We believe that our UDEG and DAR make a good contribution to sparse and dense retrievers by augmenting document representations without annotating additional query-document pairs.
Detecting quoted claims and claim speakers in news articles using transformer-based language models
Eugene Jang
MS Thesis, KAIST, 2021.
Show abstract
MS Thesis, KAIST, 2021.
Show abstract
Quotations in news articles have been suggested as a source of subjective or biased information. This work deals with automatically detecting the quoted claims and speakers of the claims in news articles. We suggest that quoted claims in news articles appear in predictable, but non-trivial ways. We annotate 33 articles for their quoted claims, speakers, and relations between claims and speakers. A dataset created with the presented annotation scheme is used to experiment with i) claim identification, ii) speaker identification, iii) claim-speaker relation identification, iv) claim group identification, and v) speaker group identification. The annotation and experimental results suggest that the contextual information must be taken into account in order to annotate and predict quoted claims and their speakers.
Construction of a dialogue generation model based on persuasion strategies
Junseop Ji
MS Thesis, KAIST, 2020
MS Thesis, KAIST, 2020
Generating diverse sentential arguments on controversial topics with a memory-augmented generation model
ChaeHun Park
MS Thesis, KAIST, 2019
MS Thesis, KAIST, 2019
Generating humorous statements for public speech with pre-trained language model and tension analyzer
Seungwon Yoon
MS Thesis, KAIST, 2019
MS Thesis, KAIST, 2019
Interpretable Depression Detection from Social Media using Hierarchical Attention Network with Depression Indicators
Hoyun Song
MS Thesis, KAIST, 2018.
Show abstract
MS Thesis, KAIST, 2018.
Show abstract
In order to effectively diagnose depression, which is one of the most harmful mental disorders, many researchers used social media by analyzing the differences in language use. However, detecting depression from social media has problems such as a small proportion of posts with depression indicators and difficulties for distinguishing depressive symptoms from temporarily depressed feelings. To address these problems, we propose hierarchical attention with depressive indicators inspired by the process of diagnosing depression by a person with domain knowledge. Our model provides not only interpretations, but also their visualizations with learned weights through attention mechanism. With this model, we can investigate different aspects of posts with depressive indicators based on psychological theories, which will help researchers to find useful evidence for depressive characteristics.
Mitigating Stereotypes in Word Embedding through Sentiment Modulation
Huije Lee
MS Thesis, KAIST, 2018.
Show abstract
MS Thesis, KAIST, 2018.
Show abstract
Word embedding is an influential framework to quantify the meaning of a word, which is widely used in machine learning at a pre-processing level for natural language processing (NLP). However, word embedding trained with a large number of contexts encodes not only general syntactic and semantic meaning of a word, but also the stereotypes and biases that people may have. This thesis proposes a method to indirectly mitigate the stereotypes in the trained word embedding by modulating the dimension of sentimental attributes in a human entity without imposing equal probability on the compatible social groups. To prevent the word embedding from creating problematic predictions such as a stereotype threat, we modulate the strength of the association between a human entity and sentimental attribute and indirectly reduce the gender bias of the embedding model. We show that the proposed method preserves the overall embedding performance. We also confirm that increasing the strength of the association between human entities and sentimental attributes amplifies the model bias through experiment.
Using syntactic structure to extract prominent gene regulatory network from the literature
Wonsuk Yang
MS Thesis, KAIST, 2017.
MS Thesis, KAIST, 2017.
Computational Identification of Sequence Variation and Environmental Condition in Clinical Depression from Biomedical Literature
Jinseon You
MS Thesis, KAIST, 2016.
Show abstract
MS Thesis, KAIST, 2016.
Show abstract
Clinical depression is a complex disease, which is known to be influenced by various factors. As genetic and environmental factors are frequently referred to as the most influential in causing depression, there have been many studies that try to identify genes or proteins and environmental conditions associated with depression. While a number of text-mining (TM) systems identifying information about the genetic factors in the biomedical literature have consequently been developed, there is currently no TM system specifically targeted at extracting environmental conditions. As a result, biologists are provided only with incomplete information about depression by these TM systems, unable to help them to discover the etiology and treatment of depression.
In the thesis, we propose a TM system that considers an interaction between genetic and environmental factors associated with depression. The system identifies not only relations between a sequence variation and depression but also changes in the relations according to environmental conditions. In order to develop the system, we split the system into two TM subsystems. The first system is applied to an existing system for extracting the relations between a sequence variation and depression from the biomedical literature. The system classifies whether the relations are positive or negative on a document level. Based on the dictionary with candidate terms for environmental conditions, the second system identifies the conditions in the biomedical literature containing the binary relations. Using the dependency of sentence, the system excludes terms wrongly classified as the conditions.
The system is a first TM system considering a ternary relation among sequence variation, disease and condition. Through the system, we are able to provide more comprehensive information about depression than other systems. We expect that, as the system is applied to other diseases, biologists can easily identify diverse information associated with changes in symptoms of diseases including depression.
Synchronization of Non-Manual Signals in Sign Language with Sequence Prediction
Jung-Ho Kim
MS Thesis, KAIST, 2016.
Show abstract
MS Thesis, KAIST, 2016.
Show abstract
There are various types of non-manual signals in sign language, which carry important linguistic information such as feeling, semantic difference and nuance. Upon investigation into the nature of non-manual signals in the bible and literature corpus, we find that several types of non-manual signals appear on a single word. It implies the possibility of the context in signed utterances. This thesis experimentally unravels the nature of non-manual signals and proposes a prediction model for the non-manual signal sequence and its advanced approach.
The correlation between non-manual signals is measured by utilizing their co-occurrence rate. The result shows close correlations among 'Trunk', 'Head', 'Brow to Eye-gaze' and 'Mouth'. To verify the existence of the context, a prediction model using conditional random fields trained on a sequence of 'gloss'-'non-manual signal' pairs is proposed, which shows superior results in comparison with a 'gloss'-'non-manual signal' dictionary-based approach.
This result suggests that synchronized non-manual signals can be predicted by the proposed model when the training is done with other non-manual signals. Also it means that the accuracy is expected to increase as we fine-tune such signals. As a result, all experiments show better performance when a sequence of 'Brow to Eye-gaze' is used as a training data.