Publications

Publications

Extraction of Gene-Environment Interaction from the Biomedical Literature

Jinseon You, Jin-Woo Chung, Wonsuk Yang, and Jong C. Park
8th International Joint Conference on Natural Language Processing (IJCNLP 2017) (accepted)

Inferring Implicit Event Locations from Context with Distributional Similarities

Jin-Woo Chung, Wonsuk Yang, Jinseon You, and Jong C. Park
Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17), pp. 979-985, Melbourne, Australia, August 19-25, 2017.
Show abstract
Automatic event location extraction from text plays a crucial role in many applications such as infectious disease surveillance and natural disaster monitoring. The fundamental limitation of previous work such as SpaceEval is the limited scope of extraction, targeting only at locations that are explicitly stated in a syntactic structure. This leads to missing a lot of implicit information inferable from context in a document, which amounts to nearly 40% of the entire location information. To overcome this limitation for the first time, we present a system that infers the implicit event locations from a given document. Our system exploits distributional semantics, based on the hypothesis that if two events are described by similar expressions, it is likely that they occur in the same location. For example, if β€œA bomb exploded causing 30 victims” and β€œmany people died from terrorist attack in Boston” are reported in the same document, it is highly likely that the bomb exploded in Boston. Our system shows good performance of a 0.58 F1-score, where state-of-the-art classifiers for intra-sentential spatiotemporal relations achieve around 0.60 F1-scores.

Neural Theorem Prover with Word Embedding for Efficient Automatic Annotation

Wonsuk Yang, Hancheol Park, and Jong C. Park
Journal of KIISE, Vol. 44, No. 4, pp. 399-410, April, 2017.

Addressing low-resource problems in statistical machine translation of manual signals in sign language

Hancheol Park, Jung-Ho Kim, and Jong C. Park
Journal of KIISE, Vol. 44, No. 2, pp. 163-170, February, 2017.

Neural Theorem Prover with Word Embedding for Efficient Automatic Annotation

Wonsuk Yang, Hancheol Park, and Jong C. Park
Proceedings of the 28th Annual Conference on Human and Cognitive Language Technology (HCLT) pp. 79-84, Busan, Korea, October 07-08, 2016.
(selected as best paper)
Show abstract
λ³Έ μ—°κ΅¬λŠ” μ „λ¬ΈκΈ°κ΄€μ—μ„œ μƒμ‚°λ˜λŠ” κ²€μ¦λœ λ¬Έμ„œλ₯Ό μ›Ήμƒμ˜ μˆ˜λ§Žμ€ κ²€μ¦λ˜μ§€ μ•Šμ€ λ¬Έμ„œμ— μžλ™ μ£Όμ„ν•˜μ—¬ μ‹  뒰도 ν–₯상 및 심화 정보λ₯Ό μžλ™μœΌλ‘œ μΆ”κ°€ν•˜λŠ” μ‹œμŠ€ν…œμ„ μ„€κ³„ν•˜λŠ” 것을 λͺ©ν‘œλ‘œ ν•œλ‹€. 이λ₯Ό μœ„ν•΄ ν™œμš© κ°€λŠ₯ ν•œ μ‹œμŠ€ν…œμΈ 인곡 μ‹ κ²½ 정리 증λͺ…계(neural theorem prover)κ°€ λŒ€κ·œλͺ¨ λ§λ­‰μΉ˜μ— μ μš©λ˜μ§€ μ•ŠλŠ”λ‹€λŠ” κ·Όλ³Έ 적인 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ λ‚΄λΆ€ μˆœν™˜ λͺ¨λ“ˆμ„ 단어 μž„λ² λ”© λͺ¨λ“ˆλ‘œ κ΅μ²΄ν•˜μ—¬ μž¬κ΅¬μΆ• ν•˜μ˜€λ‹€. ν•™μŠ΅ μ‹œκ°„μ˜ 획기적인 κ°μ†Œλ₯Ό μž…μ¦ν•˜κΈ° μœ„ν•΄ κ΅­κ°€μ•”μ •λ³΄μ„Όν„°μ˜ μ•” 예방 및 μ‹€μ²œμ— λŒ€ν•œ κ²€μ¦λœ λ¬Έμ„œλ“€μ—μ„œ μΆ”μΆœν•œ 28,844개 λͺ…μ œλ₯Ό μœ„ν‚€ν”Όλ””μ•„ μ•” κ΄€λ ¨ λ¬Έμ„œμ—μ„œ μΆ”μΆœν•œ 7,844개 λͺ…μ œμ— μ£Όμ„ν•˜λŠ” 사둀λ₯Ό ν†΅ν•˜μ—¬ 기쑴의 μ‹œμŠ€ν…œκ³Ό μž¬κ΅¬μΆ•ν•œ μ‹œμŠ€ν…œμ„ 병렬 λΉ„κ΅ν•˜μ˜€λ‹€. λ™μΌν•œ ν™˜κ²½μ—μ„œ κΈ°μ‘΄ μ‹œμŠ€ν…œμ˜ ν•™μŠ΅ μ‹œκ°„μ΄ 553.8일둜 μΆ” μ •λœ 것에 λΉ„ν•΄ μž¬κ΅¬μΆ•ν•œ μ‹œμŠ€ν…œμ€ 93.1λΆ„ λ‚΄λ‘œ ν•™μŠ΅μ΄ μ™„λ£Œλ˜μ—ˆλ‹€. λ³Έ μ—°κ΅¬μ˜ μž₯점은 인곡 μ‹ κ²½ 정리 증 λͺ…계가 λͺ¨λ“ˆν™” κ°€λŠ₯ν•œ λΉ„μ„ ν˜• μ‹œμŠ€ν…œμ΄κΈ°μ— λ‹€λ₯Έ μ„ ν˜• 논리 및 μžμ—°μ–Έμ–΄ 처리 λͺ¨λ“ˆλ“€κ³Ό λ³‘λ ¬μ μœΌλ‘œ κ²°ν•© 될 수 μžˆμŒμ—λ„ ν˜„μ‹€ 사둀에 이λ₯Ό 적용 λΆˆκ°€λŠ₯ν•˜κ²Œ ν–ˆλ˜ ν•™μŠ΅ μ‹œκ°„μ— λŒ€ν•œ 문제λ₯Ό ν•΄μ†Œν–ˆλ‹€λŠ” 점이닀.

Enhanced sign language transcription system via hand tracking and pose estimation

Jung-Ho Kim, Najoung Kim, Hancheol Park, and Jong C. Park
Journal of Computing Science and Engineering, Vol. 10, No. 3, pp. 95-101, September, 2016.
Show abstract
In this study, we propose a new system for constructing parallel corpora for sign languages, which are generally underresourced in comparison to spoken languages. In order to achieve scalability and accessibility regarding data collection and corpus construction, our system utilizes deep learning-based techniques and predicts depth information to perform pose estimation on hand information obtainable from video recordings by a single RGB camera. These estimated poses are then transcribed into expressions in SignWriting. We evaluate the accuracy of hand tracking and hand pose estimation modules of our system quantitatively, using the American Sign Language Image Dataset and the American Sign Language Lexicon Video Dataset. The evaluation results show that our transcription system has a high potential to be successfully employed in constructing a sizable sign language corpus using various types of video resources.

Making adjustments to event annotations for improved biological event extraction

Seung-Cheol Baek and Jong C. Park
Journal of Biomedical Semantics, 7:55, doi: 10.1186/s13326-016-0094-9, 16 September 2016. (SCIE IF 1.62)
Show abstract
Background
Current state-of-the-art approaches to biological event extraction train statistical models in a supervised manner on corpora annotated with event triggers and event-argument relations. Inspecting such corpora, we observe that there is ambiguity in the span of event triggers (e.g., β€œtranscriptional activity” vs. β€˜transcriptional’), leading to inconsistencies across event trigger annotations. Such inconsistencies make it quite likely that similar phrases are annotated with different spans of event triggers, suggesting the possibility that a statistical learning algorithm misses an opportunity for generalizing from such event triggers.

Methods
We anticipate that adjustments to the span of event triggers to reduce these inconsistencies would meaningfully improve the present performance of event extraction systems. In this study, we look into this possibility with the corpora provided by the 2009 BioNLP shared task as a proof of concept. We propose an Informed Expectation-Maximization (EM) algorithm, which trains models using the EM algorithm with a posterior regularization technique, which consults the gold-standard event trigger annotations in a form of constraints. We further propose four constraints on the possible event trigger annotations to be explored by the EM algorithm.

Results
The algorithm is shown to outperform the state-of-the-art algorithm on the development corpus in a statistically significant manner and on the test corpus by a narrow margin.

Conclusions
The analysis of the annotations generated by the algorithm shows that there are various types of ambiguity in event annotations, even though they could be small in number.

Prosodic and Linguistic Analysis of Semantic Fluency Data: A Window into Speech Production and Cognition

Maria Wolters, Najoung Kim, Jung-Ho Kim, Sarah E. MacPherson, and Jong C. Park
Interspeech 2016, pp. 2085-2089, San Francisco, California, September 8-12, 2016.
Show abstract
Semantic fluency is a commonly used task in psychology that provides data about executive function and semantic memory. Performance on the task is affected by conditions ranging from depression to dementia. The task involves participants naming as many members of a given category (e.g. animals) as possible in sixty seconds. Most of the analyses reported in the literature only rely on word counts and transcribed data, and do not take into account the evidence of utterance planning present in the speech signal. Using data from Korean, we show how prosodic analyses can be combined with computational linguistic analyses of the words produced to provide further insights into the processes involved in producing fluency data. We compare our analyses to an established analysis method for semantic fluency data, manual determination of lexically coherent clusters of words.

Computational Identification of Sequence Variation and Environmental Condition in Clinical Depression from Biomedical Literature

Jinseon You
MS Thesis, KAIST, 2016.
Show abstract
Clinical depression is a complex disease, which is known to be influenced by various factors. As genetic and environmental factors are frequently referred to as the most influential in causing depression, there have been many studies that try to identify genes or proteins and environmental conditions associated with depression. While a number of text-mining (TM) systems identifying information about the genetic factors in the biomedical literature have consequently been developed, there is currently no TM system specifically targeted at extracting environmental conditions. As a result, biologists are provided only with incomplete information about depression by these TM systems, unable to help them to discover the etiology and treatment of depression. In the thesis, we propose a TM system that considers an interaction between genetic and environmental factors associated with depression. The system identifies not only relations between a sequence variation and depression but also changes in the relations according to environmental conditions. In order to develop the system, we split the system into two TM subsystems. The first system is applied to an existing system for extracting the relations between a sequence variation and depression from the biomedical literature. The system classifies whether the relations are positive or negative on a document level. Based on the dictionary with candidate terms for environmental conditions, the second system identifies the conditions in the biomedical literature containing the binary relations. Using the dependency of sentence, the system excludes terms wrongly classified as the conditions. The system is a first TM system considering a ternary relation among sequence variation, disease and condition. Through the system, we are able to provide more comprehensive information about depression than other systems. We expect that, as the system is applied to other diseases, biologists can easily identify diverse information associated with changes in symptoms of diseases including depression.

Classification of Relations between Biological Entities using Word Vectors

Jimin Park, Jin-Woo Chung, and Jong C. Park
Proceedings of Korea Computer Congress (KCC), pp. 771-773, Jeju, Korea, June 29 - July 1, 2016. (poster presentation)
Show abstract
생물학적 체계 μ•ˆμ—μ„œ ꡬ성 μš”μ†Œ κ°„μ˜ 관계λ₯Ό λ…Όλ¬Έ ν…μŠ€νŠΈλ₯Ό 톡해 μ‹λ³„ν•˜λŠ” 방법과, 일반적인 단어 μ‚¬μ΄μ˜ 관계λ₯Ό 뢄포 의미 λͺ¨λΈμ„ μ΄μš©ν•˜μ—¬ λΆ„λ₯˜ν•˜λŠ” 방법에 λŒ€ν•΄μ„œλŠ” λ§Žμ€ 연ꡬ가 각각 μžˆμ—ˆμœΌλ‚˜, 두 방법을 κ²°ν•©ν•œ μ‹œλ„λŠ” 거의 λ³΄κ³ λ˜μ§€ μ•Šμ•˜λ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” 뢄포 λͺ¨λΈμ΄ 생물학적인 체계 μ•ˆμ—μ„œ 두 κ΅¬μ„±μš”μ†Œκ°€ λ§Ίκ³  μžˆλŠ” 관계λ₯Ό μ˜ˆμΈ‘ν•˜λŠ” 데 μ–΄λ–€ κΈ°μ—¬λ₯Ό ν•˜λŠ”μ§€ μ•Œμ•„λ³΄μ•˜λ‹€. μ‹€ν—˜ κ²°κ³Ό, 뢄포 λͺ¨λΈμ΄ 생물학적 ꡬ성 μš”μ†Œ κ°„μ˜ 관계 식별에 μœ μš©ν•œ 자질둜 ν™œμš©λ  수 μžˆμ„ ν™•μΈν•˜μ˜€λ‹€.

Addressing Low-Resource Problems in Statistical Machine Translation of Sign Language

Hancheol Park, Jung-Ho Kim, and Jong C. Park
Proceedings of Korea Computer Congress (KCC), pp. 714-716, Jeju, Korea, June 29 - July 1, 2016.
(selected as best paper)
Show abstract
졜근 톡계적 기계 λ²ˆμ—­ 기법을 μ΄μš©ν•œ μˆ˜ν™” λ²ˆμ—­ 연ꡬ가 ν™œλ°œν•΄μ§μ—λ„ λΆˆκ΅¬ν•˜κ³ , 병렬 λ§λ­‰μΉ˜ μžμ›μ˜ ν¬μ†Œμ„± λ¬Έμ œλŠ” 아직 ν•΄κ²°λ˜μ§€ λͺ»ν•˜κ³  μžˆλ‹€. λ³Έ μ—°κ΅¬λŠ” 톡계적 기계 λ²ˆμ—­ 방법을 μ΄μš©ν•˜μ—¬ κ΅¬μ–΄λ‘œ ν‘œν˜„ 될 수 μžˆλŠ” μ–Έμ–΄λ₯Ό μˆ˜μ§€ ν‘œν˜„μœΌλ‘œ 이루어진 μˆ˜ν™”λ‘œ λ²ˆμ—­ ν•  λ•Œ, μžμ› ν¬μ†Œμ„±μ— κΈ°μΈν•˜λŠ” λ¬Έμ œμ λ“€μ„ ν•΄κ²°ν•  수 μžˆλŠ” μ„Έ 가지 μ „μ²˜λ¦¬ 방법을 μ œμ‹œν•œλ‹€. 결과적으둜 μžμ› ν¬μ†Œμ„± 문제λ₯Ό μ•ˆκ³  μžˆλŠ” μˆ˜ν™” λ²ˆμ—­μ—μ„œ μ‹€μ œλ‘œ λ²ˆμ—­ μ„±λŠ₯을 ν–₯μƒμ‹œν‚¬ 수 μžˆλŠ” 방법듀이 무엇인지λ₯Ό μ‹€ν—˜μ„ 톡해 ν™•μΈν•œλ‹€. λ³Έ μ—°κ΅¬μ—μ„œ μ œμ•ˆν•˜λŠ” μ „μ²˜λ¦¬ 방법은 ꡬ어 λ¬Έμž₯의 νŒ¨λŸ¬ν”„λ ˆμ΄μ§•μ„ ν†΅ν•œ λ§λ­‰μΉ˜ ν™•μž₯ 방법, ꡬ어 λ‹¨μ–΄μ˜ ν‘œμ œμ–΄ν™”λ₯Ό ν†΅ν•œ κ°œλ³„ μ–΄νœ˜ λΉˆλ„λ₯Ό λ†’μ΄λŠ” 방법, 그리고 μˆ˜μ§€ μ •λ³΄λ‘œ ν‘œν˜„λ˜μ§€ μ•ŠλŠ” ꡬ어 ν’ˆμ‚¬μ— ν•΄λ‹Ήν•˜λŠ” 단어λ₯Ό μ œκ±°ν•¨μœΌλ‘œμ¨ ꡬ어와 μˆ˜ν™” κ°„ λ¬Έμž₯ 성뢄을 μΌμΉ˜μ‹œν‚€λŠ” 방법이닀. μ˜μ–΄μ™€ λ―Έκ΅­ μˆ˜ν™” 병렬 λ§λ­‰μΉ˜λ₯Ό μ΄μš©ν•œ μ‹€ν—˜μ„ ν†΅ν•˜μ—¬ μ„Έ 가지 μ „μ²˜λ¦¬ 방법 쀑 νŒ¨λŸ¬ν”„λ ˆμ΄μ¦ˆ 생성 및 ν‘œμ œμ–΄ν™”μ˜ 적용 μ‹œμ—λ§Œ λ²ˆμ—­ ν’ˆμ§ˆμ΄ ν–₯μƒλœλ‹€λŠ” 사싀을 μ•Œ 수 μžˆμ—ˆλ‹€. 특히, 두 방법이 같이 적용될 λ•Œ κ°€μž₯ 높은 μ„±λŠ₯을 λ³΄μ˜€λ‹€.

Language processing through the collaboration with field experts

Jong C. Park
Invited talk, 2016 Spring conference of the Language Research Institute of Hankuk University of Foreign Studies, May 27, 2016.

Synchronization of Non-Manual Signals in Sign Language with Sequence Prediction

Jung-Ho Kim
MS Thesis, KAIST, 2016.
Show abstract
There are various types of non-manual signals in sign language, which carry important linguistic information such as feeling, semantic difference and nuance. Upon investigation into the nature of non-manual signals in the bible and literature corpus, we find that several types of non-manual signals appear on a single word. It implies the possibility of the context in signed utterances. This thesis experimentally unravels the nature of non-manual signals and proposes a prediction model for the non-manual signal sequence and its advanced approach. The correlation between non-manual signals is measured by utilizing their co-occurrence rate. The result shows close correlations among 'Trunk', 'Head', 'Brow to Eye-gaze' and 'Mouth'. To verify the existence of the context, a prediction model using conditional random fields trained on a sequence of 'gloss'-'non-manual signal' pairs is proposed, which shows superior results in comparison with a 'gloss'-'non-manual signal' dictionary-based approach. This result suggests that synchronized non-manual signals can be predicted by the proposed model when the training is done with other non-manual signals. Also it means that the accuracy is expected to increase as we fine-tune such signals. As a result, all experiments show better performance when a sequence of 'Brow to Eye-gaze' is used as a training data.

A Morphological Approach to the Longitudinal Detection of Dementia

Najoung Kim and Jong C. Park
HCI Conference Korea, High1 Resort, Gangwon, January 27-29, 2016.
Show abstract
The impact of cognitive impairment on linguistic abilities has been a topic of continuous interest in dementia studies. However, there is a lack of systematic agreement on the longitudinal association between dementia progression and the patients' morphological capacity, and the role of morphological phenomena other than inflection has been relatively underreported. We present a longitudinal study of writings by Iris Murdoch (diagnosed of Alzheimer's Disease after her death) and Arthur Conan Doyle (no known record of dementia diagnosis), using two novel measures to account for the usage of complex morphology and lexical innovation. The results imply an association between lexical innovation and cognitive decline caused by dementia, as observed in Murdoch's works beginning from her mid-fifties, in contrast to a milder tendency in Doyle's works. Our findings contribute to a potential for facilitating early diagnosis of dementia through automated language processing approaches.

Biomedical Event Extraction and Management in Big-scale Biomedical Literature

Rize Jin, Jinseon You, and Jong C. Park
42nd KIISE Winter Conference, Phoenix Park, December 17-19, 2015. (poster presentation)
Show abstract
λŒ€μš©λŸ‰ 생물학 λ¬Έν—Œ 정보가 좕적됨에 따라 생물학 μ—°κ΅¬μžλ“€μ˜ 연ꡬλ₯Ό 효과적으둜 돕기 μœ„ν•œ λ¬Έν—Œ 정보 관리 μ‹œμŠ€ν…œμ΄λ‚˜ 검색 엔진과 같은 도ꡬ듀이 λ“±μž₯ν•˜μ˜€λ‹€. μ΄λŸ¬ν•œ 도ꡬ듀은 생물학 연ꡬ에 λ§Žμ€ 도움을 μ£Όκ³  μžˆμœΌλ‚˜ λ³΅μž‘ν•œ μ—°μ‚° μ²˜λ¦¬μ— μžˆμ–΄μ„œλŠ” 아직 λΆ€μ‘±ν•œ 점이 λ§Žμ€ 싀정이닀. 특히 κ²€μƒ‰μ—”μ§„μ˜ 경우 단어 μˆ˜μ€€μ˜ μ§ˆμ˜μ–΄λŠ” μ‰½κ²Œ μ²˜λ¦¬ν•  수 μžˆμœΌλ‚˜ 단어 μ‚¬μ΄μ˜ 관계λ₯Ό λ‚˜νƒ€λ‚΄λŠ” λ³΅μž‘ν•œ μ§ˆμ˜μ–΄μ— λŒ€ν•΄μ„œλŠ” 아직 처리 μˆ˜μ€€μ΄ λ―Έν‘ν•˜λ‹€. 이에 생물학 μ–Έμ–΄ 처리 λΆ„μ•Όμ—μ„œλŠ” λ³΅μž‘ν•œ μ§ˆμ˜μ–΄λ₯Ό μ²˜λ¦¬ν•˜κΈ° μœ„ν•΄ μœ μ „μž 식별, 생물학 이벀트 식별과 같은 ν…μŠ€νŠΈ λ§ˆμ΄λ‹ 연ꡬ가 ν™œλ°œνžˆ μ§„ν–‰λ˜μ—ˆμœΌλ©° μƒλ‹Ήν•œ μˆ˜μ€€μ˜ 정확도λ₯Ό λ³΄μ˜€λ‹€. κ·ΈλŸ¬λ‚˜ μ΄λŸ¬ν•œ ν…μŠ€νŠΈ λ§ˆμ΄λ‹ μ‹œμŠ€ν…œλ“€μ€ μ „κ³ΌλŠ” 달리 λ³΅μž‘ν•œ 연산을 μˆ˜ν–‰ν•¨μ— 따라 λŒ€μš©λŸ‰ μ²˜λ¦¬μ—λŠ” μ ν•©ν•˜μ§€ μ•Šκ²Œ μ„€κ³„λ˜μ—ˆκ³  μ΄λŠ” 생물학 μ–Έμ–΄ 처리 뢄야에 λŒ€μš©λŸ‰ μ²˜λ¦¬κ°€ 점점 더 ν•„μš”ν•΄μ§€λ©΄μ„œ μ‹¬κ°ν•œ 문제둜 λŒ€λ‘ λ˜μ—ˆλ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” λΆ„μ‚° μ‹œμŠ€ν…œμΈ ν•˜λ‘‘μ„ μ΄μš©ν•΄ ν…μŠ€νŠΈ λ§ˆμ΄λ‹ μ‹œμŠ€ν…œ 쀑 ν•˜λ‚˜μΈ 이벀트 식별 μ‹œμŠ€ν…œμ΄ λŒ€μš©λŸ‰ 데이터λ₯Ό 효과적으둜 μ²˜λ¦¬ν•  수 μžˆλ„λ‘ μ‹œμŠ€ν…œμ„ 고도화 ν•˜λŠ” λ°©μ•ˆμ„ μ œμ‹œν•œλ‹€.

A New Measure of Clustering and Switching Based on Bigrams

Maria Wolters, Sarah MacPherson, Jinseon You, Rize Jin, Seung-Cheol Baek, and Jong C. Park
Psychonomic Society's 56th Annual Meeting, Chicago, USA, November 19-22, 2015. (poster presentation)
Show abstract
The category fluency task (CFT) provides important information about executive abilities such as initiation set-shifting and inhibition. CFT sequences are generated by retrieving groups of related words (β€œclustersβ€œ) from semantic memory. Manual annotation schemes have been developed for inferring these clusters from transcribed CFT sequences (Troyer 2008), but these are time-consuming and require training. We propose an automatic analysis technique that is based on a simple statistical model of CFT sequences. This model can be easily adapted to different languages and domains, given sufficient training data. CFT sequences (domain β€œanimalsβ€œ) were generated by 104 younger adults aged 18-34 years and 100 older adults aged 50-84 years who were native speakers of UK English. The sequences were categorised both manually and using our automated method with key measures such as the number of switches significantly correlating (rho=0.4, 95% CI [0.28-0.51]). Both methods also resulted in the significant age differences that are consistently reported in the cognitive aging literature.

Corpus Annotation with a Linguistic Analysis of the Associations between Event Mentions and Spatial Expressions

Jin-Woo Chung, Jinseon You, and Jong C. Park
Proceedings of the 29th Pacific Asia Conference on Language, Information, and Computation (PACLIC 29), pp. 539-547, Shanghai, China, October 30-November 1, 2015.
Show abstract
Recognizing spatial information associated with events expressed in natural language text is essential for the proper interpretation of such events. However, the associations between events and spatial information found throughout the text have been much less studied than other types of spatial association as looked into in SpatialML and ISO-Space. In this paper, we present an annotation framework for the linguistic analysis of the associations between event mentions and spatial expressions in broadcast news articles. Based on the corpus annotation and analysis, we discuss which information should be included in the guidelines and what makes it difficult to achieve a high inter-annotator agreement. We also discuss possible improvements on the current corpus and annotation framework for insights into developing an automated system.

A System for Constructing a Korean-to-KSL Parallel Corpus

Jung-Ho Kim, Umang Sehgal, and Jong C. Park
17th Annual Conference on Korean Sign Language, Kongju University, Gongju, Korea, August 15, 2015. (poster presentation)
Show abstract
ν•œκ΅­μ–΄-ν•œκ΅­μˆ˜μ–΄ 병렬 λ§λ­‰μΉ˜λŠ” κ΄€λ ¨ μ‚¬μ „μ΄λ‚˜ μžλ™ λ²ˆμ—­ μ‹œμŠ€ν…œμ— ν™œμš©λ  수 μžˆμ–΄ κΈ΄μš”ν•˜λ‹€. κ·ΈλŸ¬λ‚˜ 일반 병렬 λ§λ­‰μΉ˜ κ΅¬μΆ•κ³ΌλŠ” 달리, μˆ˜μ–΄μ˜ 곡간 언어적인 νŠΉμ„± λ•Œλ¬Έμ— ꡬ좕이 μš©μ΄ν•˜μ§€ μ•Šλ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” 효율적으둜 ν•œκ΅­μ–΄-ν•œκ΅­μˆ˜μ–΄ 병렬 λ§λ­‰μΉ˜λ₯Ό ꡬ좕할 수 μžˆλŠ” μ‹œμŠ€ν…œμ„ μ œμ•ˆν•œλ‹€.

CoMAGD: Annotation of Gene-Depression Relations

Rize Jin, Jinseon You, Jin-Woo Chung, Hee-Jin Lee, Maria Wolters, and Jong C. Park
Proceedings of the 2015 ACL Workshop on Biomedical Natural Language Processing (BioNLP 2015), pp. 104-113, Beijing, China, July 30, 2015.
Show abstract
Clinical depression is a mental disorder involving genetics and environmental factors. Although much work studied its genetic causes and numerous candidate genes have consequently been looked into and reported in the biomedical literature, no gene expression changes or mutations regarding depression have yet been adequately collected and analyzed for its full pathophysiology. In this paper, we present a depression-specific annotated corpus for text mining systems that target at providing a concise review of depression-gene relations, as well as capturing complex biological events such as gene expression changes. We describe the annotation scheme and the conducted annotation procedure in detail. We discuss issues regarding proper recognition of depression terms and entity interactions for future approaches to the task. The corpus is available at http://www.biopathway.org/CoMAGD.

Identification of Depression-Gene Associations from Biomedical Literature

Jinseon You, Rize Jin, Hee-Jin Lee, and Jong C. Park
Korea Computer Congress (KCC), Jeju, Korea, June 24-26, 2015.
Show abstract
μš°μšΈμ¦μ€ ν˜„λŒ€μΈλ“€μ΄ κ²ͺλŠ” λŒ€ν‘œμ μΈ μ •μ‹  μ§ˆν™˜μœΌλ‘œ κ΄€λ ¨ 호λ₯΄λͺ¬ λΆ„λΉ„λŸ‰μ— 따라 증세가 달라지고 μ΄λŠ” λ˜ν•œ κ΄€λ ¨ μœ μ „μž ν‘œν˜„ 변화에 따라 달라진닀. 우울증 κ΄€λ ¨ μœ μ „μžλ₯Ό νŒŒμ•…ν•˜κ³  μ΄λ“€κ°„μ˜ 관계λ₯Ό λ°ν˜€λ‚Έλ‹€λ©΄ ν•­μš°μšΈμ œ κ°œλ°œμ— λ§Žμ€ 도움이 될 것이닀. ν˜„μž¬ 이에 λŒ€ν•œ μ—°κ΅¬λŠ” ν™œλ°œνžˆ 진행 쀑에 μžˆμœΌλ‚˜ κ΄€λ ¨λœ λͺ¨λ“  μœ μ „μžλ₯Ό ν•œ λ²ˆμ— νŒŒμ•…ν•˜κΈ°λŠ” μ–΄λ ΅λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ•”κ³Ό μœ μ „μžκ°„μ˜ 관계λ₯Ό μ°ΎλŠ” 방법둠을 λ„μž…ν•˜μ—¬ 우울증과 μœ μ „μžκ°„ 관계λ₯Ό μžλ™μœΌλ‘œ νŒŒμ•…ν•˜λŠ” μ‹œμŠ€ν…œμ„ κ΅¬μΆ•ν•œλ‹€. μ΄λŠ” ν–₯ν›„ 우울증과 μœ μ „μž κ°„μ˜ μ‹¬ν™”λœ 관계λ₯Ό λ°νžˆλŠ”λ° ν•„μš”ν•œ μ½”νΌμŠ€ μ œμž‘μ— 큰 도움이 될 κ²ƒμœΌλ‘œ κΈ°λŒ€λœλ‹€.

Corpus Annotation for the Linguistic Analysis of Reference Relations between Event and Spatial Expressions in Text

Jin-Woo Chung, Hee-Jin Lee, and Jong C. Park
Language and Information, Vol. 18, No. 2. pp. 141-168, 2014.
Show abstract
Recognizing spatial information associated with events expressed in natural language text is essential not only for the interpretation of such events and but also for the understanding of the relations among them. However, spatial information is rarely mentioned as compared to events and the association between event and spatial expressions is also highly implicit in a text. This would make it difficult to automate the extraction of spatial information associated with events from the text. In this paper, we give a linguistic analysis of how spatial expressions are associated with event expressions in a text. We first present issues in annotating narrative texts with reference relations between event and spatial expressions, and then discuss surface-level linguistic characteristics of such relations based on the annotated corpus to give a helpful insight into developing an automated recognition method.

Construction of a Korean-to-KSL Parallel Corpus by Effective Motion Capture of Hand Shapes

Jung-Ho Kim and Jong C. Park
41st KIISE Winter Conference, Phoenix Park, December 18-20, 2014. (poster presentation)
Show abstract
λ³Έ μ—°κ΅¬μ—μ„œλŠ” ν•œκ΅­μ–΄μ™€ ν•œκ΅­μˆ˜μ–΄ κ°„μ˜ 병렬 μ½”νΌμŠ€λ₯Ό μ œμž‘ν•˜κΈ° μœ„ν•˜μ—¬ μˆ˜ν˜•(Hand Shape)의 효율적 μˆ˜μ§‘ λ°©μ•ˆμ„ μ œμ‹œν•˜λ©°, μ†λ™μž‘ λ²”μœ„μ— ν•œν•˜μ—¬ μˆ˜μ–΄ λ™μž‘μ„ 인식 및 μˆ˜μ§‘ν•˜κΈ° μœ„ν•΄ 립λͺ¨μ…˜(Leap Motion)을 μ΄μš©ν•œλ‹€. μ œμ‹œν•œ λ°©λ²•μœΌλ‘œ μ œμž‘λœ 병렬 μ½”νΌμŠ€μ˜ μ„±λŠ₯을 κ²€μ¦ν•˜κΈ° μœ„ν•΄ 46개의 μˆ˜μ–΄ λ™μž‘μ„ μˆ˜μ§‘ν•˜μ˜€κ³ , 미리 μˆ˜μ§‘λ˜μ§€ μ•Šμ€ 54개의 μˆ˜μ–΄ λ™μž‘μ„ μΆ”κ°€ μ„ λ³„ν•˜μ—¬ 총 100개의 μˆ˜μ–΄μ— λŒ€ν•΄ 평균 42.15%의 정확도와 58.72%의 μž¬ν˜„μœ¨μ„ κ°€μ§€λŠ” 인식 μˆ˜μ€€μ„ ν™•μΈν•˜μ˜€λ‹€. λ³Έ μ—°κ΅¬μ—μ„œ μ œμ•ˆν•˜λŠ” λ°©μ•ˆμ€ 맀우 λ³΄νŽΈμ μ΄μ–΄μ„œ λŒ€κ·œλͺ¨ 및 λ™μ‹œμ μœΌλ‘œ 자료λ₯Ό μˆ˜μ§‘ν•  수 μžˆλŠ” κ°€λŠ₯성을 보인닀.

An Effective Construction of a Korean-to-KSL Parallel Corpus

Jung-Ho Kim and Jong C. Park
Proceedings of the 26th Annual Conference on Human and Cognitive Language Technology (HCLT), pp. 13-17, ChunCheon, Korea, October 10-11, 2014.
(selected as best paper)
Show abstract
λ³Έ μ—°κ΅¬μ—μ„œλŠ” ν•œκ΅­μ–΄μ™€ ν•œκ΅­μˆ˜ν™” κ°„μ˜ 병렬 μ½”νΌμŠ€ μ œμž‘κ³Ό ν•¨κ»˜ 이에 λ”°λ₯Έ 문제λ₯Ό 닀룬닀. λ³Έ μ—°κ΅¬μ—μ„œλŠ” 병렬 μ½”νΌμŠ€λ₯Ό 효율적으둜 μ œμž‘ν•˜κΈ° μœ„ν•΄ ν‚€λ„₯νŠΈμ™€ 립λͺ¨μ…˜μ„ μ΄μš©ν•˜μ˜€κ³ , 이의 μ„±λŠ₯을 κ²€μ¦ν•˜κΈ° μœ„ν•΄ κΈ°μ‘΄ μ—°κ΅¬μ—μ„œ μ œμ‹œν•˜κ³  μžˆλŠ” μž₯갑을 ν†΅ν•œ λ™μž‘ 인식 및 μˆ˜μ§‘ 방법과 λ³Έ μ—°κ΅¬μ—μ„œ μ œμ‹œν•˜κ³  μžˆλŠ” μˆ˜μ§‘ 방법을 λΉ„κ΅ν•˜μ˜€μœΌλ©°, 비ꡐ κ²°κ³Ό μž₯갑을 톡해 μˆ˜μ§‘ν•œ 결과와 μœ μ˜λ―Έν•˜κ²Œ 차이가 λ‚˜μ§€ μ•ŠμŒμ„ ν™•μΈν•˜μ˜€λ‹€. μ΄λŠ” λ³Έ μ—°κ΅¬μ˜ λ™μž‘ μˆ˜μ§‘ 방식이 μƒλŒ€μ μœΌλ‘œ κ³ λΉ„μš©μΈ μž₯κ°‘ μˆ˜μ§‘ 방식과 λΉ„κ΅ν•˜μ—¬ 경쟁λ ₯이 μžˆμŒμ„ μ‹œμ‚¬ν•˜κ³  있으며, 특히 보편적인 자료 μˆ˜μ§‘ 방식을 μ‚¬μš©ν•˜λŠ” νŠΉμ§•κΉŒμ§€ 가지고 μžˆμ–΄μ„œ λ™μ‹œμ μœΌλ‘œ 자료λ₯Ό μˆ˜μ§‘ν•  수 μžˆμ–΄ 규λͺ¨κ°€ μžˆλŠ” 병렬 μ½”νΌμŠ€ ꡬ좕을 λ”μš± 효율적으둜 진행할 수 μžˆμ„ κ²ƒμœΌλ‘œ κΈ°λŒ€λœλ‹€.

Relation Information Extraction using a Comprehensive Representation Scheme: Applications to Oncology

Hee-Jin Lee
PhD Dissertation, KAIST, 2014.
Show abstract
Information extraction (IE) is a task of identifying relevant information from input text and producing structured data as output. While explicit expressions describing the target information are the basis for the development of IE systems, in-depth analysis of the input text becomes necessary when the information is conveyed implicitly in the text. In this dissertation, we address a specialized IE method for gene-cancer relations conveyed implicitly in biomedical text. Automatic identification of gene-cancer relations from a large volume of biomedical text is an important task for cancer research, since changes in genes are known to be the main cause of oncogenesis. In particular, it is essential to understand how a gene affects a cancer and to classify genes into oncogenes (genes that cause cancers), tumor suppressor genes (genes that protect cells from cancers) and biomarkers (genes that indicate normal or cancerous states), since such classification facilitates the process of treatment and diagnosis method development. However, despite the high volume of information on such gene classes that is conveyed implicitly with detailed descriptions about gene and cancer properties, there is not yet an IE system that is targeted at such implicit information. In this dissertation, we claim that in order to classify genes into candidates of oncogenes, tumor suppressor genes and biomarkers, gene-cancer relations described in biomedical text must be characterized with 1) how a gene changes; 2) how a cancer changes; and 3) the causality between the gene and the cancer. We propose a comprehensive representation scheme that identifies gene-cancer relations upon the three aspects above and use it for developing an advanced text mining system for oncogenes, tumor suppressor genes and biomarkers. The proposed representation scheme is shown to be adequate enough to describe the set of information that can be identified objectively from biomedical text, giving rise to an annotated corpus, or CoMAGC. The mapping between the proposed representations and the gene classes is encoded into a set of inference rules, which are validated through manual annotation and comparison with other biology databases. We present an implemented IE system that automatically extracts the information as defined by the proposed scheme, or OncoSearch. Together, we anticipate that CoMAGC and OncoSearch will enable more focused research into oncology, in the face of the rapidly accumulating amount of work in the field.

OncoSearch: Cancer Gene Search Engine with Literature Evidence

Hee-Jin Lee, Tien Cuong Dang, Hyunju Lee, and Jong C. Park
Nucleic Acids Research, (1 July 2014) 42 (W1):W416-W421. (SCI IF 8.278)
Show abstract
In order to identify genes that are involved in oncogenesis and to understand how such genes affect cancers, abnormal gene expressions in cancers are actively studied. For an efficient access to the results of such studies that are reported in biomedical literature, the relevant information is accumulated via text-mining tools and made available through the Web. However, current Web tools are not yet tailored enough to allow queries that specify how a cancer changes along with the change in gene expression level, which is an important piece of information to understand an involved gene's role in cancer progression or regression. OncoSearch is a Web-based engine that searches Medline abstracts for sentences that mention gene expression changes in cancers, with queries that specify (i) whether a gene expression level is up-regulated or down-regulated, (ii) whether a certain type of cancer progresses or regresses along with such gene expression change and (iii) the expected role of the gene in the cancer. OncoSearch is available through http://oncosearch.biopathway.org.

Mention-Level Gene Normalization on Multi-Species and Multiple Identifiers

Joon-Yeob Kim
MS Thesis, KAIST, 2014.

Identification of Speakers in Fairytales with Linguistic Clues

Hye-Jin Min, Jin-Woo Chung, and Jong C. Park
Language and Information, Vol. 17, No. 2. pp. 93-121, 2013.
Show abstract
Identifying the speakers of individual utterances mentioned in textual stories is an important step towards developing applications that involve the use of unique characteristics of speakers in stories, such as robot storytelling and story-to-scene generation. Despite the usefulness, it is a challenging task because not only human entities but also animals and even inanimate objects can become speakers especially in fairytales so that the number of candidates is much more than that in other types of text. In addition, since the action of speaking is not always mentioned explicitly, it is necessary to infer the speaker from the implicitly mentioned speaking behaviors such as appearances or emotional expressions. In this paper, we investigate a method to exploit linguistic clues to identify the speakers of utterances from textual fairytale stories in Korean, especially in order to handle such challenging issues. Compared with the previous work, the present work takes into account additional linguistic features such as vocative roles and pairs of conversation participants, and proposes the use of discourse-level turn-taking behaviors between speakers to further reduce the number of possible candidate speakers. We describe a simple rule-based method to choose a speaker from candidates based on such linguistic features and turn-taking behaviors.

Augmenting Biological Text Mining with Symbolic Inference

Jong C. Park and Hee-Jin Lee
'Biological Knowledge Discovery Handbook', editors: Mourad Elloumi and Albert Y. Zomaya, Wiley, December 27, 2013.
Show abstract
In this chapter, the authors review recent work on such β€œnext-level” text-mining tools. In particular, they focus on the work that uses symbolic inference to augment text-mining, apart from distributional analysis that is based on the co-occurrence of biological terms and statistical methods. By symbolic inference, they refer to the methods of deriving new information from known facts that are represented with nonnumeric symbols to which inference rules are applied deterministically rather than probabilistically. Researches reviewed in this chapter target one of the two abstract tasks. The first task is to recognize information not explicitly stated but implied in a document, where the targeted information is often scattered across multiple sentences. The second is to propose newly predicted biological knowledge using information gathered from the literature. They briefly review text-mining work with distributional analysis to contrast the use of symbolic inference with the use of distributional analysis.

On Mention-Level Gene Normalization

Joon-Yeob Kim, Seung-Cheol Baek, Hee-Jin Lee, and Jong C. Park
5th International Symposium on Languages in Biology and Medicine (LBM 2013), Tokyo, Japan, 12th and 13th December, 2013.
Show abstract
Document-level gene normalization (DGN), which produces a list of gene identifiers relevant to an input document, helps database curators to search for articles of interest by indexing articles with gene identifiers. Recent advances in automatic extraction of information from the biology literature call for mention-level gene normalization (MGN) systems. However, there have been no annotated corpora for MGN, probably because of a somewhat unfounded assumption (convertibility assumption) that it might be straightforward to map gene mentions into gene identifiers given a list of gene identifiers for the document. In the present work, we constructed gold standard annotations for the MGN task and assessed the validity of the convertibility assumption with GeneTUKit (Huang et al., 2011), a state-of-the-art DGN system.

Sign Language Animation Generation

Jong C. Park
Invited Presentation, Fall Colloquium, Department of Humanities and Social Sciences, KAIST, November 19, 2013.

CoMAGC: a Corpus with Multi-faceted Annotations of Gene-Cancer Relations

Hee-Jin Lee, Sang-Hyung Shim, Mi-Ryoung Song, Hyunju Lee, and Jong C. Park
BMC Bioinformatics, 14:323, doi:10.1186/1471-2105-14-323, 14 November 2013. (SCI IF 3.02)
Show abstract
Background
In order to access the large amount of information in biomedical literature about genes implicated in various cancers both efficiently and accurately, the aid of text mining (TM) systems is invaluable. Current TM systems do target either gene-cancer relations or biological processes involving genes and cancers, but the former type produces information not comprehensive enough to explain how a gene affects a cancer, and the latter does not provide a concise summary of gene-cancer relations.

Results
In this paper, we present a corpus for the development of TM systems that are specifically targeting gene-cancer relations but are still able to capture complex information in biomedical sentences. We describe CoMAGC, a corpus with multi-faceted annotations of gene-cancer relations. In CoMAGC, a piece of annotation is composed of four semantically orthogonal concepts that together express 1) how a gene changes, 2) how a cancer changes and 3) the causality between the gene and the cancer. The multi-faceted annotations are shown to have high inter-annotator agreement. In addition, we show that the annotations in CoMAGC allow us to infer the prospective roles of genes in cancers and to classify the genes into three classes according to the inferred roles. We encode the mapping between multi-faceted annotations and gene classes into 10 inference rules. The inference rules produce results with high accuracy as measured against human annotations. CoMAGC consists of 821 sentences on prostate, breast and ovarian cancers. Currently, we deal with changes in gene expression levels among other types of gene changes. The corpus is available at http://biopathway.org/CoMAGC under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0).

Conclusions
The corpus will be an important resource for the development of advanced TM systems on gene-cancer relations.

Parsing Dependency Paths to Identify Event-Argument Relation

Seung-Cheol Baek and Jong C. Park
Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP), Nagoya, Japan, October 15-17, 2013, pp. 699-705.
Show abstract
Mentions of event-argument relations, in particular dependency paths between event-referring words and argument-referring words, can be decomposed into meaningful components arranged in a regular way, such as those indicating the type of relations and the others allowing relations with distant arguments (e.g., coordinate conjunction). We argue that the knowledge about arrangements of such components may provide an opportunity for making event extraction systems more robust to training sets, since unseen patterns would be derived by combining seen components. However, current state-of-the-art machine learning based approaches to event extraction tasks take the notion of components at a shallow level by using n-grams of paths. In this paper, we propose two methods called pseudo-count and Bayesian methods to semi-automatically learn PCFGs by analyzing paths into components from the BioNLP shared task training corpus. Each lexical item in the learned PCFGs appears in 2.6 distinct paths on average between event-referring words and argument-referring words, suggesting that they contain recurring components. We also propose a grounded way of encoding multiple parse trees for a single dependency path into feature vectors in linear classification models. We show that our approach can improve the performance of identifying event-argument relations in a statistically significant manner.

Speaker-TTS Voice Mapping towards Natural and Characteristic Robot Storytelling

Hye-Jin Min, Sang-Chae Kim, Joon-Yeob Kim, Jin-Woo Chung, and Jong C. Park
Proceedings of the 22nd IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2013), pp. 793-800, Gyeongju, Korea, August 26-29, 2013.
Show abstract
Robot storytelling has the potential for its practical use in various domains such as entertainment, education, and rehabilitation. However, relying on human-recorded voices for natural storytelling is costly, and automation with text-to-speech systems is not readily applicable due to the difficulty of reflecting the full nature of stories in TTS systems. In this paper, we address the problem of automating robot storytelling with a particular focus on two issues: speaker identification and speaker-TTS voice mapping. We first conduct text analysis with rich linguistic clues to identify speakers from a given textual story. We then consider the task of speaker-TTS voice mapping as the graph coloring problem and propose effective algorithms for assigning voices to speakers given a limited number of TTS voices. Finally, we perform a user experiment on validating the usefulness of our method. The results demonstrate that our system significantly outperforms baseline systems and is also more acceptable to users.

Enhancing Readability of Web Documents by Text Augmentation for Deaf People

Jin-Woo Chung, Hye-Jin Min, JoonYeob Kim, and Jong C. Park
International Conference on Web Intelligence, Semantics, and Mining (WIMS), Madrid, Spain, June 12-14, 2013.
Show abstract
Deaf people have particular difficulty in understanding text-based web documents because their mother language, or sign language, is essentially visually oriented. To enhance the readability of text-based web documents for deaf people, we propose a news display system that converts complex sentences in news articles into simple sentences and presents the relations among them with a graphical representation. In particular, we focus on the tasks of 1) identifying subordinate and embedded clauses in complex sentences, 2) relocating them for better readability and 3) displaying the relations among the clauses with the graphical representation. The results of our evaluation show that the proposed system does simplify complex sentences in news articles effectively while maintaining their intended meaning, suggesting that our system can be used in practice to help deaf people to access textual information.

DigSee: Disease Gene Search Engine with Evidence Sentences (version cancer)

Jeongkyun Kim, Seongeun So, Hee-Jin Lee, Jong C. Park, Jung-jae Kim, and Hyunju Lee
Nucleic Acids Research, Vol. 41, No. W1, pp. 501-517, 12 June 2013 (SCI IF 8.026).
Show abstract
Biological events such as gene expression, regulation, phosphorylation, localization and protein catabolism play important roles in the development of diseases. Understanding the association between diseases and genes can be enhanced with the identification of involved biological events in this association. Although biological knowledge has been accumulated in several databases and can be accessed through the Web, there is no specialized Web tool yet allowing for a query into the relationship among diseases, genes and biological events. For this task, we developed DigSee to search MEDLINE abstracts for evidence sentences describing that β€˜genes’ are involved in the development of β€˜cancer’ through β€˜biological events’. DigSee is available through http://gcancer.org/digsee.

Generating Chatting Messages in a Consistent Style with Authorship Attribution Methods

Sang-Chae Kim
MS Thesis, KAIST, 2013.

Blog Corpus-based Clustering Scheme for Category Fluency Test (CFT) Data Clustering

Yong-Jae Lee, Maria Wolters, Hee-Jin Lee, and Jong C. Park
HCI Conference Korea, High1 Resort, Gangwon, Jan. 30-Feb. 1, 2013.
Show abstract
Category Fluency Test (CFT) is one of the most popular methods to screen dementia and is used in particular to evaluate the organization of the semantic memory and verbal fluency of a patient with dementia. The CFT performance is assessed according to the number of items each patient produces during the test. Recently, however, researchers have also proposed to evaluate the performance by considering the pattern of clusters and switches of the CFT data, with efforts to figure out the clusters and switches on the CFT data computationally. In this work, we propose a novel blog corpus-based clustering scheme to analyze the clusters and switches of the CFT data in a computational manner. In addition, we will argue for the need of the blog corpus-based clustering scheme by comparing it with the previous work on automatic CFT data clustering.

Analyzing and Mapping Expressions of Tense for Korean-Korean Sign Language Translation

JoonYeob Kim, Jin-Woo Chung, and Jong C. Park
Proceedings of the KIISE Fall Conference, Vol. 39 No. 2-B, pp. 121-123, Chungnam National University, November 23-24, 2012.
Show abstract
μˆ˜ν™”λŠ” 농인 μ‚¬νšŒμ—μ„œ 주둜 μ‚¬μš©λ˜λŠ” μ‹œκ°μ–Έμ–΄λ‘œμ„œ μŒμ„±μ–Έμ–΄μΈ ν•œκ΅­μ–΄μ™€ ν‘œν˜„ μ–‘μ‹μ—μ„œ λ§Žμ€ 차이λ₯Ό 보인닀. 특히 ν•œκ΅­μ–΄μ—μ„œλŠ” νŠΉμ • λ¬Έλ²•ν˜•νƒœμ†Œλ₯Ό μ„œμˆ μ–΄μ™€ κ²°ν•©μ‹œν‚΄μœΌλ‘œμ¨ μ‹œμ œλ₯Ό λͺ…μ‹œμ μœΌλ‘œ λ“œλŸ¬λ‚΄λŠ” λ°˜λ©΄μ—, μˆ˜ν™”μ˜ 경우 μ„œμˆ μ–΄μ™€ κ²°ν•©ν•˜λŠ” ν˜•νƒœμ†Œλ‚˜ μ‹œμ œλ₯Ό μœ„ν•œ λ³„λ„μ˜ κΈ°λŠ₯μ–΄κ°€ μ—†κΈ° λ•Œλ¬Έμ— μ„œμˆ μ–΄μ˜ μ‹œμ œ ν‘œν˜„μ„ μœ μ§€ν•˜λŠ” 것이 μ–΄λ ΅λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν•œκ΅­μ–΄-μˆ˜ν™” 병렬 λ°μ΄ν„°μ˜ 각 λ¬Έμž₯에 λ‚˜νƒ€λ‚˜λŠ” μ‹œμ œ ν‘œν˜„μ„ λΆ„μ„ν•œ κ²°κ³Όλ₯Ό λ°”νƒ•μœΌλ‘œ, 주어진 ν•œκ΅­μ–΄ λ¬Έμž₯을 μ μ ˆν•œ μˆ˜ν™” λ¬Έμž₯으둜 λ³€ν™˜ν•˜κΈ° μœ„ν•΄ ν•„μš”ν•œ μ‹œμ œ ν‘œν˜„ 방법에 λŒ€ν•΄μ„œ λ…Όμ˜ν•œλ‹€.

Product Name Classification for Product Instance Distinction

Hye-Jin Min and Jong C. Park
The 26th Pacific Asia Conference on Language, Information, and Computation (PACLIC 26), Bali, Indonesia, November 7-10, 2012.
Show abstract
Product names with a temporal cue in a product review often refer to several product instances purchased at different times. Previous approaches to product entity recognition and temporal information analysis do not take into account such temporal cues and thus fail to distinguish different product instances. We propose to formulate the resolution of such product names as a classification problem by utilizing time expressions, event features and other temporal cues for a classifier in two stages, detecting the existence of such temporal cues and identifying the purchase time. The empirical results show that term-based features and existing event-based features together enhance the performance of product instance distinction.

Automatic Speaker Identification in Fairytales towards Robot Storytelling

Hye-Jin Min, Sang-Chae Kim, and Jong C. Park
Proceedings of the 24th Annual Conference on Human and Cognitive Language Technology (HCLT), pp. 77-84, Busan, Korea, October 12-13, 2012.
Show abstract
λ³Έ μ—°κ΅¬μ—μ„œλŠ” λ‘œλ΄‡μ˜ μžλ™ 동화ꡬ연을 λͺ©ν‘œλ‘œ λ°œν™”λ¬Έμž₯ μƒμ˜ 감정 νŒŒμ•… 및 λ“±μž₯인물 별 λ‹€μ–‘ν•œ TTS 보이슀 선택에 ν™œμš© κ°€λŠ₯ν•œ λ°œν™”λ¬Έμž₯의 ν™”μž νŒŒμ•…λ¬Έμ œλ₯Ό 닀룬닀. λ³Έ μ—°κ΅¬μ—μ„œλŠ” κΈ°μ‘΄ κ·œμΉ™κΈ°λ°˜ λ°©λ²•λ‘ μ—μ„œ 많이 ν™œμš©λ˜μ–΄μ˜¨ 자질인 ν›„λ³΄μ˜ μœ„μΉ˜, ν™”μž ν›„λ³΄μ˜ 주격/λͺ©μ κ²© μ—¬λΆ€, λ°œν™”λ™μ‚¬ 쑴재 μ—¬λΆ€λ₯Ό λΉ„λ‘―ν•˜μ—¬ 동화에 자주 λ‚˜νƒ€λ‚˜λŠ” λ“±μž₯인물의 의미적 λΆ„λ₯˜ 및 λ“±μž₯인물의 λ“±μž₯/퇴μž₯κ³Ό κ΄€λ ¨λœ 동사듀을 μΆ”κ°€ 자질둜 ν™œμš©ν•œλ‹€. μ‚¬λžŒ 및 동식물, 무생물이 λͺ¨λ‘ ν™”μžκ°€ 될 수 μžˆλŠ” 동화 μ½”νΌμŠ€μ—μ„œ μ œμ•ˆν•œ μžμ§ˆλ“€μ„ ν™œμš©ν•˜μ—¬ μ˜μ‚¬κ²°μ •νŠΈλ¦¬λ‘œ ν•™μŠ΅ 및 κ²€μ¦ν•œ κ²°κ³Ό, κ·œμΉ™κΈ°λ°˜μ˜ 베이슀라인 방법에 λΉ„ν•΄ μ΅œλŒ€ 49%의 정확도가 ν–₯μƒλ˜μ—ˆκ³ , μ œμ•ˆν•œ 방법둠이 λ°μ΄ν„°μ˜ 변화에도 κ°•μΈν•œ 것을 확인할 수 μžˆμ—ˆλ‹€.

Use of Clue Word Annotations as the Silver-standard in Training Models for Biological Event Extraction

Seung-Cheol Baek and Jong C. Park
Proceedings of the 5th International Symposium on Semantic Mining in Biomedicine (SMBM 2012), pp. 34-41, University of Zurich, Switzerland, September 3-4, 2012.
Show abstract
Current state-of-the-art approaches to biological event extraction train models by reconstructing relevant graphs from training sentences, where labeled nodes correspond to tokens that indicate the presence of events and the relations between nodes correspond to the relations between these events and their participants. Since multi-word expressions may also indicate events, these approaches use heuristic rules to define target graphs to reconstruct by mapping various clue words into single tokens. Since training instances define actual problems to solve, the method of deriving graphs must affect the system performance, but there has not been any related study on this aspect, to the best of our knowledge. In this study, we propose an incorporation of an EM algorithm into supervised learning to look for training graphs that are more favorable for model construction. We evaluate our algorithm on the development dataset in the 2009 BioNLP shared task and show that this algorithm makes a statistically meaningful improvement on the performance of trained models over a supervised learning algorithm on a fixed set of training graphs. The models and graphs are available at http://biopathway.org/EventExtraction/.

Identifying Mentions about Long-term Experiences and Sentiment Change on a Specific Target based on Linguistic Analysis: Application to a Product Review Domain

Hye-Jin Min
PhD Dissertation, KAIST, 2012.
Show abstract
People post and share their experiences through social media on the web these days. The resulting user-generated web documents have become a useful source of advice for making a decision or resolving difficulties because people can learn from others’ past successes or failures. Recently, in response to the rapid growth of such documents and great potential of experience-based information, researches have been conducted on analyzing experiences in user-generated web documents. Earlier work has addressed the issue on distinguishing β€œexperience sentences” from others and has proposed a discrimination method based on the linguistic properties of the mentioned events in such sentences. However, such work has focused mostly on a single event at a sentence level in large-scale data, so that a meaningful series of a specific person’s experiences on a particular target has not been analyzed fully yet. This dissertation presents a method to analyze mentions about target-oriented experiences. More specifically, we propose a novel method to identify mentions about a customer’s experiences on a particular product in two aspects: long-term experiences and sentiment change in such experiences. As for long-term experiences, the hypothesis is that the two linguistic expressions time expressions and product names fully capture the customer’s long-term experiences mentioned in a review. As for sentiment change, the hypothesis is that sentiment change can be determined by detecting the state in a such review such that the overall sentiment towards a product instance purchased at a certain time in the past may not be the same as the overall sentiment towards another instance purchased at the latest time. In this dissertation, we address three major research questions. The first question is about identifying product names. Unlike previous researches on identification on a product entity level, instance level identification for instance distinction should be accounted for. Our research question is to determine the types of linguistic feature that are useful for such distinction. Based on experimental results, we argue that linguistic features including time expressions, term-based features and event features should be combined differently with respect to the linguistic characteristics of the product names referring to each type of instance. More specifically, we argue that the best combination for the distinction between recent purchases and past purchases is time expressions and term-based features, and the best combination between recent purchases and recent & past purchases is time expressions and event features. The second question is about sentiment classification regarding product names. The inherent polarity of the adjectival modifier should be blocked when it is used to refer to the property or the identity of the product. Regarding the question of determining the context in which the polarity of the adjectival modifier be blocked, we argue that the refined blocking rules with the semantic types of nouns, verbs, and clauses based on compositionality-based syntactic rules enhance the sentiment classification performance especially for neutral sentences. As for product name-sentiment association, we argue that comparative expressions are crucial to associating the compared target with the sentiment opposite to the one in the given grammatical structure and also argue that the product names referring to generic objects are crucial to discarding the sentiment in the given grammatical structure. The last question is about how we utilize the results from our method. As practical applications, we demonstrate a system that identifies helpful reviews by utilizing the proposed measure. The user study shows that this measure is not only as helpful as the best existing ones, such as β€˜helpful vote’ or β€˜reviewer rank’, but is also free from undesirable biases. We also illustrate another application that rates product reviews with respect to sentiment change. The user study shows that the review rating system based on sentiment change is more credible than the system based on the clause-level sentiment classification.

Towards Automatic Evaluation of Category Fluency Test Performance: Distinguishing Groups using Word Clustering

Yong-Jae Lee, Maria Wolters, Hee-Jin Lee, and Jong C. Park
Korea Computer Congress (KCC), Jeju, Korea, June 27-29, 2012.
Show abstract
The Category Fluency Test (CFT) is a widely used verbal fluency test. The standard measure of scoring the test is the number of distinct words that a subject generates during the test. Recently, other measures have also been proposed to evaluate performance, such as clustering and switching. In this study, we examine clusters and switches can be assessed using word similarity measures. Based on these measures, we can distinguish between subject groups.

Age and Gender Prediction from Korean Tweets with Stylometric Analysis

Sang-Chae Kim and Jong C. Park
Korea Computer Congress (KCC), Jeju, Korea, June 27-29, 2012.
Show abstract
μ‚¬λžŒλ“€μ€ μ£Όλ³€μ˜ 영ν–₯을 λ°›μ•„ κ°€λ©΄μ„œ 각자의 λ…νŠΉν•œ κΈ€μ“°κΈ° 양식을 λ§Œλ“€μ–΄κ°„λ‹€. λ”°λΌμ„œ 같은 μ—°λ ΉλŒ€μ™€ 성별을 κ°€μ§€λŠ” μ‚¬λžŒλ“€μ€ μœ μ‚¬ν•œ κΈ€μ“°κΈ° 양식을 λ‚˜νƒ€λ‚΄λŠ” κ²½ν–₯이 μžˆλ‹€. 이와 같은 가정을 λ°”νƒ•μœΌλ‘œ, λ³Έ μ—°κ΅¬μ—μ„œλŠ” λ‹€μ–‘ν•œ μ—°λ ΉλŒ€μ™€ μ„±λ³„μ˜ μ‚¬λžŒλ“€μ΄ μž‘μ„±ν•œ νŠΈμœ—μ˜ 문체λ₯Ό λΆ„μ„ν•˜μ—¬ μž„μ˜μ˜ νŠΈμœ—μ„ μž‘μ„±ν•œ μ €μžμ˜ μ—°λ ΉλŒ€μ™€ 성별을 μ˜ˆμΈ‘ν•˜λŠ” μ‹€ν—˜μ„ μ§„ν–‰ν•˜μ˜€λ‹€.
ν•œκ΅­μ–΄ μ›Ή μ–Έμ–΄μ—μ„œ 자주 λ³΄μ΄λŠ” ν‘œν˜„λ“€μ„ ν† λŒ€λ‘œ κ΅¬μ„±ν•œ μžμ§ˆλ“€κ³Ό, 그에 λΉ„ν•΄ 데이터와 관계가 적은 n-gram λ‹¨μœ„μ˜ μžμ§ˆλ“€μ„ ν•¨κ»˜ μ‚¬μš©ν•˜μ—¬ μ˜ˆμΈ‘μ„ μ§„ν–‰ν•¨μœΌλ‘œμ¨, μ΅œλŒ€ 곡산 κΈ°μ€€μΉ˜λ³΄λ‹€ 25% κ°€λŸ‰ 높은 정확도λ₯Ό λ³΄μ΄λŠ” 예츑 κ²°κ³Όλ₯Ό μ–»κ²Œ λ˜μ—ˆλ‹€. 이와 ν•¨κ»˜ 각 자질 ꡬ성이 μ˜ˆμΈ‘μ— μ–Όλ§ˆλ‚˜ 효율적으둜 κΈ°μ—¬ν•˜λŠ”μ§€μ— λŒ€ν•œ 이해도λ₯Ό 높일 수 μžˆμ—ˆλ‹€.

Quality Analysis of User-generated Content on the Web

Jong C. Park and Hye-Jin Min
'Knowledge Service Engineering Handbook', editors: Jussi Kantola and Waldemar Karwowski, CRC Press, Taylor & Francis Group, pp. 197–220, May, 2012.

E3Net: A System for Exploring E3-mediated Regulatory Networks of Cellular Functions

Youngwoong Han, Hodong Lee, Jong C. Park, and Gwan-Su Yi
Molecular and Cellular Proteomics, March Issue, 2012, doi:10.1074/mcp.O111.014076, December 22, 2011. (SCI IF 8.35)
Show abstract
Ubiquitin-protein ligase (E3) is a key enzyme targeting specific substrates in diverse cellular processes for ubiquitination and degradation. The existing findings of substrate specificity of E3 are, however, scattered over a number of resources, making it difficult to study them together with an integrative view. Here we present E3Net, a web-based system that provides a comprehensive collection of available E3-substrate specificities and a systematic framework for the analysis of E3-mediated regulatory networks of diverse cellular functions. Currently, E3Net contains 2201 E3s and 4896 substrates in 427 organisms and 1671 E3-substrate specific relations between 493 E3s and 1277 substrates in 42 organisms, extracted mainly from MEDLINE abstracts and UniProt comments with an automatic text mining method and additional manual inspection and partly from high throughput experiment data and public ubiquitination databases. The significant functions and pathways of the extracted E3-specific substrate groups were identified from a functional enrichment analysis with 12 functional category resources for molecular functions, protein families, protein complexes, pathways, cellular processes, cellular localization, and diseases. E3Net includes interactive analysis and navigation tools that make it possible to build an integrative view of E3-substrate networks and their correlated functions with graphical illustrations and summarized descriptions. As a result, E3Net provides a comprehensive resource of E3s, substrates, and their functional implications summarized from the regulatory network structures of E3-specific substrate groups and their correlated functions. This resource will facilitate further in-depth investigation of ubiquitination-dependent regulatory mechanisms. E3Net is freely available online at http://pnet.kaist.ac.kr/e3net. Molecular & Cellular Proteomics 11: 10.1074/mcp.O111.014076, 1–14, 2012.

Fairy Tale Summarization through Sentence Selection

SeungJoo An
MS Thesis, KAIST, 2012.

Probabilistic Filtering for a Biological Knowledge Discovery System with Text Mining and Automatic Inference

Hee-Jin Lee and Jong C. Park
Journal of the Korean Society Of Computer and Information, Vol. 17, No. 2, pp. 139-148, February 2012.
Show abstract
λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν…μŠ€νŠΈ λ§ˆμ΄λ‹μ„ 톡해 생물학 λ¬Έν—Œμ—μ„œ λΆ„μž μˆ˜μ€€μ˜ 사건(event) 정보λ₯Ό μžλ™μœΌλ‘œ μΆ”μΆœν•˜κ³ , 이듀 사건 정보λ₯Ό 기반으둜 μƒˆλ‘œμš΄ 생물학 지식을 μžλ™ μΆ”λ‘ ν•˜λŠ” ν…μŠ€νŠΈ λ§ˆμ΄λ‹ - μΆ”λ‘  톡합 ꡬ쑰의 μ‹œμŠ€ν…œμ„ 닀룬닀. μ΄λŸ¬ν•œ 톡합 ꡬ쑰의 지식 발견 μ‹œμŠ€ν…œμ€ 미리 μΆ”μΆœλ˜μ–΄ λ°μ΄ν„°λ² μ΄μŠ€μ— λ“±λ‘λœ μ •λ³΄λ§Œμ„ μž…λ ₯으둜 μ‚¬μš©ν•˜λŠ” μ‹œμŠ€ν…œλ“€μ— λΉ„ν•˜μ—¬ μ΅œμ‹  정보λ₯Ό 보닀 빨리 μ‚¬μš©ν•  수 있고, 미리 μ •μ˜λœ ν˜•μ‹ μ΄μ™Έμ˜ λ‹€μ–‘ν•œ 정보λ₯Ό 사 μš©ν•  수 μžˆλ‹€λŠ” μž₯점이 μžˆλ‹€. 반면, ν…μŠ€νŠΈ λ§ˆμ΄λ‹ 정보 μΆ”μΆœ κ²°κ³Όλ₯Ό κ·ΈλŒ€λ‘œ μ‚¬μš©ν•˜κΈ° λ•Œλ¬Έμ— ν…μŠ€νŠΈ λ§ˆμ΄λ‹ λͺ¨λ“ˆ(module)의 μ„±λŠ₯에 따라 전체 μ‹œμŠ€ν…œμ˜ νš¨μš©μ„±μ΄ 크게 μ €ν•˜λ  μˆ˜λ„ μžˆλ‹€λŠ” λ¬Έμ œκ°€ μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν™•λ₯  기반 필터링(filtering) 방법을 μ œμ•ˆν•˜μ—¬, ν…μŠ€νŠΈ λ§ˆμ΄λ‹ κ²°κ³Ό 쀑 μ–‘μ„± 였λ₯˜(false positive)λ₯Ό 효과적으둜 μ œκ±°ν•¨μœΌλ‘œμ¨ 전체 지식 발견 μ‹œμŠ€ν…œμ˜ 정확도 및 νš¨μš©μ„±μ„ λ†’μ΄κ³ μž ν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œ μ œμ•ˆν•œ ν™•λ₯  기반 필터링 방법은 κΈ°μ€€(baseline) λ°©λ²•μœΌλ‘œ μ‚¬μš©λœ 횟수 기반 필터링 방법보닀 높은 μ„±λŠ₯을 λ³΄μ˜€λ‹€.

Identifying Helpful Reviews Based on Customer’s Mentions about Experiences

Hye-Jin Min and Jong C. Park
Expert Systems With Applications, doi:10.1016/j.eswa.2012.01.116, January 25, 2012. (SCIE IF 1.924)
Show abstract
As numerous on-line product reviews that vary in quality are published every day, much attention is being paid to quality assessment of such reviews. The current metric of using the number of votes by other customers such as β€˜helpful vote’, despite its dominance, does not yield a fully effective outcome. In this article, we propose a novel metric to rank product reviews by β€˜mentions about experiences’, accounting for customer’s personal experiences, as a way of identifying high quality reviews. The proposed metric has two parameters that capture time expressions related to the use of products and product entities over different purchasing time periods by linguistic clues. The empirical results show that this metric is not only as helpful as the best existing metrics, β€˜helpful vote’ or β€˜reviewer rank’, but is also free from undesirable biases that either penalize recency or are driven solely by popularity. Our usability study also shows that ordering reviews by our metric is considered helpful on the accounts of both usefulness and satisfaction.

Analyzing the Patterns of Switching and Clustering on CFT Data Using Hidden Markov Model

Yong-Jae Lee, Hee-Jin Lee, Maria Wolters, and Jong C. Park
HCI Conference Korea, Alpensia resort, January 11-13, 2012.
Show abstract
Early detection of dementia allows people to have more time to prepare themselves for the symptom. As one of the methods to screen dementia, Category Fluency Test (CFT) is used to evaluate the organization of semantic memory and to assess the verbal fluency performance of patients with dementia. Recently, various measures to evaluate their CFT performance have been studied and, in particular, clusters and switches of the CFT data are considered as important factors. In this work, we analyze the clusters and switches of the CFT data by using Hidden Markov Model (HMM) to verify the hypothesis that a comprehensive pattern analysis of their switches and clusters can reveal important characteristics of verbal fluency performance.

Age Prediction from Korean Tweets with Style-Based Feature Analysis

Sang-Chae Kim and Jong C. Park
HCI Conference Korea, Alpensia resort, January 11-13, 2012.
Show abstract
Authorship attribution is a task of predicting the author from analyzing his/her writing. An increasing popularity of the Internet has made it easy for the authorship attribution researchers to access large corpora with annotated authorship. Such large corpora have enabled the researchers to predict the authors’ demographic characteristics such as age. In this paper, we analyze tweets in Korean with a small number of style-based features such as emoticons and propose a way of using these features to predict the age group. Our prediction resulted in a relatively high accuracy of 0.75

Analyzing Disagreements among ICD-9-CM Coders

Seung-Cheol Baek and Jong C. Park
4th International Symposium on Languages in Biology and Medicine (LBM 2011), Nanyang Technological University, Singapore, December 14-15, 2011.
Show abstract
NLP researchers find it difficult to acquire and interpret clinical free text directly, most likely because of the unfamilarity with medical practices. This is why publicly available annotated corpora would be of much help, but there are still very few in the clinical domain due to patient confidentiality. In this regard, it is encouraging to see that Computational Medicine Center’s 2007 Challenge provides a publicly available corpus consisting of radiology reports with ICD-9-CM codes as independently assigned by three different coders. However, the corpus shows many disagreements among the coders, making it imperative to set the standard correctly for their proper interpretation. A proposal for such a standard as implicitly advanced by its developers is to take the majority annotation. In this paper, we propose an alternative method to address such disagreements. We believe our work not only makes a meaningful improvement on the utility of this corpus but also has good implications for similar tasks, such as ICD-10-CM coding.

Identifying Gene Expression Changes in Prostate Cancer Cells from the Literature

Hee-Jin Lee, Hyunju Lee, and Jong C. Park
4th International Symposium on Languages in Biology and Medicine (LBM 2011), Nanyang Technological University, Singapore, December 14-15, 2011.
Show abstract
We propose to identify information about gene expression changes in diseased cells from the literature, utilizing event extraction techniques. Gene expression changes in a diseased cell or tissue happen when its expression level is either higher or lower than the level in normal states. Such information can be critically used in the next stage of understanding the molecular mechanisms of the disease, leading naturally to its pathway. In this work, we focus on prostate cancer (PC), one of the most troubling cancers.

Detecting and Blocking False Sentiment Propagation

Hye-Jin Min and Jong C. Park
Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), pp. 354–362, Chiang Mai, Thailand, November 8-13, 2011.
Show abstract
Sentiment detection of a given expression involves interaction with its component constituents through rules such as polarity propagation, reversal or neutralization. Such compositionality-based sentiment detection usually performs better than a vote-based bag-of words approach. However, in some contexts, the polarity of the adjectival modifier may not always be correctly determined by such rules, especially when the adjectival modifier characterizes the noun so that its denotation becomes a particular concept or an object in customer reviews. In this paper, we examine adjectival modifiers in customer review sentences whose polarity should either be propagated (SHIFT) or not (UNSHIFT). We refine polarity propagation rules in the literature by considering both syntactic and semantic clues of the modified nouns and the verbs that take such nouns as arguments. The resulting rules are shown to work particularly well in detecting cases of β€˜UNSHIFT’ above, improving the performance of overall sentiment detection at the clause level, especially in β€˜neutral’ sentences. We also show that even such polarity that is not propagated is still necessary for identifying implicit sentiment of the adjacent clauses.

Automatic Conversion of Korean into Korean Sign Language Based on Combinatory Categorial Grammar

Jong C. Park
Keynote Speech, Joint Conference of the Modern Linguistic Society of Korea and the Korean Society for Language and Information, Gongju National University of Education, Korea, November 5, 2011.

Text Parsing for Sign Language Generation with Combinatory Categorial Grammar

Jin-Woo Chung and Jong C. Park
2nd International Workshop on Sign Language Translation and Avatar Technology (SLTAT), 13th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS), University of Dundee, UK, October 23, 2011.
Show abstract
In this paper, we propose a method to convert a written sentence in spoken language into a suitable representation in sign language within the framework of Combinatory Categorial Grammar (CCG). The representation reflects the multi-channel nature of sign language performance, including manual and non-manual linguistic signals of multiple channels and information about their coordination. We show that most information needed to address linguistic phenomena in sign language such as word order, spatial references, classifier construction, and verb inflection can be encoded in the CCG sign lexicon. During the CCG derivation process, a semantic representation for sign language expressions is created so that the resulting output can be directly interpreted as a sequence of signs, each containing manual and non-manual components and representing their coordination and spatial relationship. The derivation process with the constructed lexicon is presented with several examples for Korean Sign Language. We discuss implications of our proposal and future directions.

Revisiting Concatenative Video Synthesis with Relaxed Constraints

Sangyong Gil and Jong C. Park
2nd International Workshop on Sign Language Translation and Avatar Technology (SLTAT), 13th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS), University of Dundee, UK, October 23, 2011.
Show abstract
In this paper, we propose a method to convert a written sentence in spoken language into a suitable representation in sign language within the framework of Combinatory Categorial Grammar (CCG). The representation reflects the multi-channel nature of sign language performance, including manual and non-manual linguistic signals of multiple channels and information about their coordination. We show that most information needed to address linguistic phenomena in sign language such as word order, spatial references, classifier construction, and verb inflection can be encoded in the CCG sign lexicon. During the CCG derivation process, a semantic representation for sign language expressions is created so that the resulting output can be directly interpreted as a sequence of signs, each containing manual and non-manual components and representing their coordination and spatial relationship. The derivation process with the constructed lexicon is presented with several examples for Korean Sign Language. We discuss implications of our proposal and future directions.

Reproducing Fairy Tales for Plot Identification

SeungJoo An and Jong C. Park
Proceedings of the 23rd Annual Conference on Human and Cognitive Language Technology (HCLT), pp. 3-8, Seoul, Korea, October 6-7, 2011.
Show abstract
ν…μŠ€νŠΈμ˜ μŠ€ν† λ¦¬λ₯Ό μžλ™μœΌλ‘œ μ΄ν•΄ν•˜κΈ° μœ„ν•΄ ν…μŠ€νŠΈμ—μ„œ 기술된 사건(event)을 νŒŒμ•…ν•˜κ³  이듀을 μ‘°ν•©ν•˜μ—¬ μŠ€ν† λ¦¬κ°€ μ–΄λ–»κ²Œ κ΅¬μ„±λ˜μ–΄ μžˆλŠ”μ§€λ₯Ό νŒŒμ•…ν•˜λŠ” 연ꡬ듀이 μ§„ν–‰λ˜μ–΄ μ™”λ‹€. ν•˜μ§€λ§Œ μ΄λŠ” μŠ€ν† λ¦¬μ˜ κΉŠμ€ 의미 둠적 이해λ₯Ό μš”κ΅¬ν•˜λŠ” 것 이외에도 ν…μŠ€νŠΈλ§ˆλ‹€ 상황과 μΌμ–΄λ‚˜λŠ” 사건듀이 λ‹€μ–‘ν•˜κΈ° λ•Œλ¬Έμ— μ–Έμ–΄ μžμ›μ΄ λΆ€μ‘±ν•œ ν™˜κ²½μ—μ„œμ˜ μ²˜λ¦¬μ—λŠ” ν•œκ³„κ°€ μžˆλ‹€. μ΄λŸ¬ν•œ λ¬Έμ œλŠ” 사건듀을 좔상화 ν•˜μ—¬ λ‹¨μˆœν•˜κ²Œ ν‘œν˜„ν•  수 μžˆλ‹€ λ©΄ μŠ€ν† λ¦¬ μ΄ν•΄μ˜ μžμ—°μŠ€λŸ¬μ›€μ„ μ €ν•΄ν•˜μ§€ μ•Šκ³  ν•΄κ²° ν•  수 μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ‚¬κ±΄λ“€μ˜ 좔상화 과정을 μœ„ν•œ 기초 μ—°κ΅¬λ‘œμ„œ ν…μŠ€νŠΈ 속 λ“±μž₯인물이 ν–‰ν•˜κ±°λ‚˜ λ‹Ήν•˜λŠ” 사건듀을 μΆ”μΆœν•˜μ—¬ PMI기법을 톡해 μ‚¬κ±΄μ˜ 흐름을 νŒŒμ•…ν•˜κ³  언어학적 λ‹¨μ„œλ₯Ό μ°Έμ‘°ν•˜μ—¬ μŠ€ν† λ¦¬ 이해 과정에 λˆ„λ½λ  수 μžˆλŠ” 사건듀을 μΆ”κ°€ν•˜μ—¬ 보완 ν•˜μ˜€λ‹€. μ΄λŸ¬ν•œ 접근을 톡해 λ“±μž₯인물이 ν–‰ν•  수 μžˆλŠ” 사건듀을 μž¬κ΅¬μ„±ν•˜μ—¬ λ‹¨μˆœν™”ν•˜λŠ” 방법을 μ œμ‹œν•œλ‹€.

Reading Desk for Preschool Children and Older People with Emotional Speech Synthesis

Ho-Joon Lee, Yong-Jae Lee, and Jong C. Park
International Conference on Convergence and Hybrid Information Technology (ICHIT), LNCS 6935, pp. 740-747, Daejeon, Korea, September 23-25, 2011.
Show abstract
In this paper, we introduce a reading desk designed to read books to the older people and children. For this purpose, we propose a reading desk together with an emotional speech synthesis system for Korean. The reading desk system provides a wireless audio output unit, and the reading desk is directly connected to a laptop computer in order to identify the current user and target reading material. The emotional speech synthesis system for Korean is a prosody re-synthesis system that has the option of providing four different emotions such as anger, fear, happiness, and sadness. Therefore, this system is also able to modify the speech rate and intensity information of speech as much as users want. We analyzed 240 pieces of emotional speech in order to extract distinct prosody structures for each emotion in Korean. The evaluation results show that we have achieved 48.5% of the recognition rate for happiness among four emotions, and with enough training experience, the average recognition rate has improved up to 95.5% for all emotions.

Linguistic Analysis of Picture Description for Language Impairment Diagnosis

Yong-Jae Lee, Hye-Jin Min, and Jong C. Park
Korea Computer Congress (KCC), Gyeongju, Korea, June 30-July 2, 2011.
Show abstract
μ‚¬λžŒμ€ μ„±μž₯ λ°°κ²½μ΄λ‚˜ ν•™μŠ΅μ— 따라 고유의 μ–Έμ–΄ μ‚¬μš© νŠΉμ„±μ„ κ°€μ§€κ²Œ λœλ‹€. μ΄λŸ¬ν•œ μ–Έμ–΄ μ‚¬μš© νŠΉμ„±μ€ 개 인의 μ–Έμ–΄ μœ μ°½μ„±μ— λŒ€ν•œ μ§€ν‘œλ₯Ό μ œκ³΅ν•˜λ©°, μ–Έμ–΄ μ‚¬μš© νŠΉμ„±μ— λŒ€ν•œ 뢄석은 μž₯애에 λ”°λ₯Έ 변화에도 λŠ₯동적 으둜 λŒ€μ²˜ν•  수 있게 ν•œλ‹€. κ·ΈλŸ¬λ‚˜ μ–΄λ–€ νŠΉμ •μΈμ˜ μ–Έμ–΄ μ‚¬μš© νŠΉμ„±μ„ νŒŒμ•…ν•˜λŠ” μ—°κ΅¬λŠ” 아직 λΆ€μ‘±ν•œ μ‹€μ • 이닀. λ³Έ μ—°κ΅¬μ—μ„œλŠ” 개인 μ–Έμ–΄ μ‚¬μš© νŠΉμ„± νŒŒμ•…μ„ μœ„ν•˜μ—¬ 일차적으둜 μΌλ°˜μΈλ“€μ˜ κ·Έλ¦Ό μ„€λͺ…κΈ€ 데이터λ₯Ό λͺ¨μ•˜μœΌλ©°, 이에 λŒ€ν•œ 뢄석 결과에 κΈ°λ°˜ν•˜μ—¬ μ–Έμ–΄ μž₯μ•  진단에 μ μš©ν•˜κΈ° μœ„ν•œ μ–Έμ–΄ μ‚¬μš© νŠΉμ„±μ„ νŒŒμ•…ν•˜κ³  자 ν•œλ‹€. λ³Έ μ—°κ΅¬μ˜ 결과둜 ν˜•νƒœμ†Œ λ‹¨μœ„, 단어 λ‹¨μœ„, 그리고 λ‚΄μš© μ „λ‹¬μ˜ 방식에 λ”°λ₯Έ 개인의 μ–Έμ–΄ μ‚¬μš© νŠΉμ„±μ„ 일뢀 νŒŒμ•…ν•  수 μžˆμ—ˆμœΌλ©°, 이와 같은 νŠΉμ„±μ€ ν–₯ν›„ μΉ˜λ§€μ™€ 같은 인지 κΈ°λŠ₯ μž₯μ• λ‘œ μΈν•œ μ–Έμ–΄ μ‚¬μš© 의 λ³€ν™”λ₯Ό μΆ”μ ν•˜λŠ”λ° μ€‘μš”ν•œ μ‹€λ§ˆλ¦¬λ₯Ό μ œκ³΅ν•  수 μžˆμ„ κ²ƒμœΌλ‘œ κΈ°λŒ€λœλ‹€.

Research on Automatic Sign Language Generation: State of the Art and Future Directions

Jin-Woo Chung, Ho-Joon Lee, and Jong C. Park
Invited Presentation, the 13th Annual Conference on Korean Sign Language, Korea National College of Rehabilitation and Welfare, Pyeongtaek, Korea, June 11, 2011.

Improving Accessibility to Web Documents for the Aurally Challenged with Sign Language Animation

Jin-Woo Chung, Ho-Joon Lee, and Jong C. Park
International Conference on Web Intelligence, Mining and Semantics (WIMS'11), Sogndal, Norway, May 25-27, 2011.
Show abstract
In this paper, we describe how to improve accessibility for the aurally challenged in a web environment, focusing on utilizing a signing avatar for web pages. Many systems were previously proposed to make a web environment more accessible for the deaf people by providing signed expressions, i.e. translating written text into sign language animations and presenting them in a proper way, based on the observation that deaf users normally have much difficulty understanding text-based information as well as audio contents. We analyze the strengths and weaknesses of these systems with respect to discussed design criteria, and propose a system that presents a signing avatar for web page documents via a mobile device, which is expected to overcome the shortcomings of the previous systems and to improve the accessibility of deaf users to textual contents in a web environment. The proposed system has three main parts based on a client-server architecture: 1) a client that executes a web browser and transmits selected text to the server, 2) a server that takes text as input and translates it into signed expressions through a sign language generation module, and 3) a mobile device that displays signing animation transmitted from the server by streaming. We also present some linguistic issues raised by the difference between Korean and Korean Sign Language. To the best of our knowledge, this is the first approach to the use of a mobile device for web document access by the aurally challenged people. We discuss implications of our study and future directions.

Natural Language Processing for the Aurally Challenged and for the Elderly

Jong C. Park
Research Seminar, University of Dundee, UK, March 16, 2011

Physical Push with a Socially Intelligent Robot: Make your wishes to 'Genie in the Lamp'

Hye-Jin Min and Jong C. Park
Proceedings of the 6th IEEE/ACM International Conference on Human-Robot Interaction, Late Breaking News, pp. 203-204, March 6-9, 2011, Lausanne, Switzerland. ACM
Show abstract
This paper proposes a robotic agent named β€˜Genie’ that understands a user’s wish and gives its possible answers on a social network platform. Once a potential wish is detected upon monitoring the text updates in the micro-blog of the user, the agent initiates a task to help the user with both NLP and metadata analysis. As an interaction scenario, we set the type of a robot as an agent that identifies wishful products by searching for and analyzing product information on the web. After an analysis of the vast amount of data, the agent provides possible answers to the user as a way of granting the wish that might require additional time and effort to achieve. In order to draw the user's attention, the agent makes a physical movement as a push notification with more user-friendliness.

Annotation of Protein State Information in Biomedical Text

Hee-Jin Lee and Jong C. Park
9th Asia Pacific Bioinformatics Conference (APBC), Poster Presentation, Incheon, Korea, January 11-14, 2011.

Korean Speech Synthesis for Automatic Fairy Tale Narration with Automatic Identification of Character Roles

SeungJoo An, Ho-Joon Lee, and Jong C. Park
HCI Conference Korea, Alpensia resort, January 26-28, 2011.
Show abstract
λΆ€λͺ¨λ“€μ΄ λͺ¨λ‘ 일을 ν•˜μ—¬ 아이듀이 혼자 μžˆλŠ” μ‹œκ°„μ΄ λŠ˜μ–΄λ‚˜κ²Œ 됨에 따라 μ•„μ΄λ“€μ—κ²Œ ν•„μš”ν•œ μ„œλΉ„μŠ€λ₯Ό μ œκ³΅ν•˜λŠ” μ‹œμŠ€ν…œμ΄ ν•„μš”ν•˜κ²Œ λ˜μ—ˆλ‹€. 이 μ€‘μ—μ„œ μžλ™ 동화 ꡬ연 μ‹œμŠ€ν…œμ€ μ•„μ΄λ“€μ˜ μ–Έμ–΄ λŠ₯λ ₯κ³Ό μ •μ„œ λ°œλ‹¬μ— 도움을 쀄 수 μžˆλ‹€. 이 λ•Œ, 동화 속 λ“±μž₯ 인물의 역할이 μ œλŒ€λ‘œ νŒλ‹¨λ˜μ§€ λͺ»ν•œλ‹€λ©΄ 동화가 μ „λ‹¬ν•˜κ³ μž ν•˜λŠ” μ˜λ―Έμ™€ λ‹€λ₯΄κ²Œ 동화 λ‚΄μš©μ„ λ°œν™” ν•  수 μžˆλ‹€. λ³Έ 논문은 동화 속 λ“±μž₯인물의 역할을 λΆ„λ₯˜ν•˜κΈ° μœ„ν•΄μ„œ 닀루어야 ν•  언어적 μš”μ†Œλ“€μ„ ν†΅ν•˜μ—¬ 동화 속 λ“±μž₯인물의 μžλ™ μ—­ν•  λΆ„λ₯˜ μ‹œμŠ€ν…œμ„ μ œμ•ˆν•˜κ³ , μ΄λ ‡κ²Œ λΆ„λ₯˜λœ 역할에 λ”°λΌμ„œ μ μ ˆν•œ μŒμ„± 합성을 ν†΅ν•˜μ—¬ 보닀 λ™ν™”μ˜ 의미 전달이 λΆ„λͺ…ν•œ μžμ—°μŠ€λŸ¬μš΄ μŒμ„± ν‘œν˜„μ„ ν•  수 μžˆλŠ” μŒμ„± ν•©μ„± μ‹œμŠ€ν…œμ„ μ œμ•ˆν•˜κ³ μž ν•œλ‹€.
As there is a growing tendency where parent leave their children alone for their work, a system which provides necessary services to children is needed. Among these services, an automatic fairy tale narration system can help language and emotional development of young children. However, if roles of the characters in the story cannot be determined correctly by an automatic fairy tale narration system, the meaning of fairy tales can be conveyed differently, if not distorted. In this paper, we propose an automatic role identification system based on linguistic clues to classify such roles, and through such classified roles, a speech synthesis system for more natural and clear automatic fairy tale narration.

Identifying Sentence Types in Korean with Morpho-Syntactic Analysis

Jin-Woo Chung
MS thesis, KAIST, 2011.

Evaluation of Emotion Categories based on the Analysis of Emotion-Rich Fairy Tales

Ho-Joon Lee and Jong C. Park
HCI Conference Korea, Alpensia resort, January 26-28, 2011.
Show abstract
λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ „λž˜ ꡬ연 동화λ₯Ό λΆ„μ„ν•˜μ—¬, λ°œν™”λ¬Έμ— λŒ€ν•œ 감정 μƒνƒœκ°€ λͺ…μ‹œμ μœΌλ‘œ ν‘œν˜„λœ λ¬Έμž₯을 μΆ”μΆœν•˜κ³ , μΆ”μΆœλœ 감정 μƒνƒœλ₯Ό λ°”νƒ•μœΌλ‘œ 감정 λ²”μ£Όμ˜ 뢄포λ₯Ό κ³„μ‚°ν•˜μ—¬, μ „λž˜κ΅¬μ—° λ™ν™”μ—μ„œ λ‚˜νƒ€λ‚˜λŠ” 감정 λ²”μ£Όμ˜ νŠΉμ„±μ„ λΆ„μ„ν•œλ‹€. κ·Έ κ²°κ³Ό 화남과 λ†€λžŒμ˜ 감정은 λ‹€λ₯Έ 감정에 λΉ„ν•΄ λ‹¨μΌν™”λœ ν˜•νƒœλ‘œ ν‘œν˜„λ˜λŠ” 것을 확인할 수 있으며, μ΅œμ’…μ μœΌλ‘œ μ΄λŸ¬ν•œ 정보가 감정 ν•©μ„±μ΄λ‚˜ 감정 인식 κ³Όμ •μ—μ„œ ν™œμš©λ  수 μžˆλŠ” κ°€λŠ₯성을 보인닀.
In this paper, we analyze the characteristics of emotion categories derived from the utterances of fairy tales. For this purpose, we extract explicit emotional states of each utterance, and calculate their distributions. As a result, we find that the emotional state of anger and astonishment are well-defined emotion categories, whereas other need more refinement. This finding can be used for the improvement of emotional speech synthesis and recognition systems.

Multi-modal Assessment and Treatment to Retain and Enhance Human Performance in Ageing

Jong C. Park, Jinah Park, KiWoong Kim, and Joon-Kyung Seong
Discerning Diversity in Ageing - SBF/SBMN workshop, University of Edinburgh, UK, November 10, 2010.

Quality of Life Technology for the Aurally Challenged and for the Aged

Jong C. Park
Annual Seminar Series, University of Manchester, UK, November 24, 2010.

Automatic Identification of Character Roles for Natural Fairy Tale Narration

SeungJoo An and Jong C. Park
KIISE Fall Conference, Danguk University, November 5-6, 2010.
Show abstract
동화λ₯Ό ꡬ연할 λ•Œ κ΅¬μ—°μžλŠ” 동화 속 λ“±μž₯ 인물의 역할을 λ°”νƒ•μœΌλ‘œ 감정을 μ‹€μ–΄ λ°œν™”ν•œλ‹€. 이λ₯Ό ν†΅ν•˜μ—¬ λ…μžμΈ μœ μ•„λ“€μ˜ 관심을 μœ λ°œν•˜κ³  λͺ°μž…μ‹œν‚΄μœΌλ‘œμ¨, 이해도λ₯Ό 높인닀. 이와 같이 동화 속 인물의 역할에 λŒ€ν•œ μ μ ˆν•œ μ΄ν•΄λŠ” μžλ™ 동화 ꡬ연에 μžˆμ–΄μ„œ μ€‘μš”ν•œ μš”μ†Œ 쀑 ν•˜λ‚˜μ΄λ‹€. λ³Έ 논문은 동화 속 λ“±μž₯인물의 역할을 λΆ„λ₯˜ν•˜κΈ° μœ„ν•΄μ„œ 닀루어야 ν•  언어적 μš”μ†Œλ“€μ— λŒ€ν•˜μ—¬ μ‚΄νŽ΄λ³Έλ‹€. λ˜ν•œ 이λ₯Ό λ°”νƒ•μœΌλ‘œ μ΄λŸ¬ν•œ μ—­ 할을 μžλ™μœΌλ‘œ λΆ„λ₯˜ν•˜κ³ , μ²˜λ¦¬ν•˜λŠ” μ‹œμŠ€ν…œμ„ μ œμ‹œν•œλ‹€.

A Ubiquitous Smart Parenting and Customized Education Service Robot

Ho-Joon Lee and Jong C. Park
The 2010 IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO), 2010.
Show abstract
In this paper, we introduce a u-SPACE service robot, designed to help children who may be left alone while their caregivers are away from home. In order to protect children from indoor dangers, this service robot provides customized guiding messages taking into account the location information and behavioral patterns of a child, after the detection of dangerous objects and situations. And these guiding messages are vocalized by our emotional speech generation system. This emotional speech generation system is also being put to use in reading fairy tales to a child, as a part of a home education service. The outward appearance of the u-SPACE service robot is modeled on a teddy bear, in order to provide a safe and comforting environment for children. Two touch sensors designed for basic interactions between a child and the robot are installed on each hand of the robot, and an RFID tag is placed inside the body. A PDA with a Wi-Fi communication module, a touch screen, and a speaker is used as a main operating device of this u-SPACE service robot.

Detecting and Resolving Syntactic Ambiguity for Automatic Korean-Korean Sign Language Translation

Jin-Woo Chung and Jong C. Park
Proceedings of the 22nd Annual Conference on Human and Cognitive Language Technology, pp. 55-62, 2010.
Show abstract
μˆ˜ν™”λŠ” 농인 μ‚¬νšŒμ—μ„œ 주둜 μ‚¬μš©λ˜λŠ” μ‹œκ°μ–Έμ–΄λ‘œμ„œ μŒμ„±μ–Έμ–΄μΈ ν•œκ΅­μ–΄μ™€ 톡사적인 μΈ‘λ©΄μ—μ„œ λ§Žμ€ 차이λ₯Ό 보인닀. 특히 μˆ˜ν™”μ—μ„œλŠ” 쑰사와 μ–΄λ―Έκ°€ 거의 μ‚¬μš©λ˜μ§€ μ•ŠκΈ° λ•Œλ¬Έμ— ν•œκ΅­μ–΄ λ¬Έμž₯μ—μ„œ 기쑴의 λ°©λ²•λŒ€λ‘œ 이 듀을 μ œκ±°ν•œ ν›„ μ–΄μˆœμ„ κ³ λ €ν•˜μ§€ μ•Šμ€ 채 λ¬Έμž₯ μ„±λΆ„μ˜ κΈ°λ³Έν˜•μ„ κ·ΈλŒ€λ‘œ λ‚˜μ—΄ν•˜μ—¬ μˆ˜ν™”λ¬Έμ„ 생성할 경우 λ¬Έμž₯ μ„±λΆ„ κ°„μ˜ 톡사적 관계가 μ• λ§€ν•΄μ§ˆ 수 μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 톡사적 μ€‘μ˜μ„±μ΄ ν•œκ΅­μ–΄ λ¬Έμž₯을 μˆ˜ν™”λ¬Έ 으둜 λ³€ν™˜ν•˜λŠ” κ³Όμ •μ—μ„œ μΆ”κ°€μ μœΌλ‘œ λ‚˜νƒ€λ‚˜κ²Œ λ˜λŠ” νŠΉμ • 톡사ꡬ쑰에 μ˜ν•΄ λ°œμƒν•˜λŠ” κ²ƒμœΌλ‘œ 보고, μ΄λŸ¬ν•œ 톡사ꡬ쑰λ₯Ό 기본논항ꡬ쑰, ν•œμ •μˆ˜μ‹κ΅¬μ‘°, 병렬ꡬ쑰, μ„œμˆ κ΅¬μ‘°λ‘œ λΆ„λ₯˜ν•˜μ—¬ 각각을 νŒŒμ•…ν•˜κ³  그에 따라 톡사 적 μ€‘μ˜μ„±μ„ ν•΄μ†Œν•˜λŠ” 방법을 μ œμ‹œν•œλ‹€.

Personal Prosody Model based Korean Emotional Speech Synthesis

Ho-Joon Lee
PhD dissertation, KAIST, 2010.
Show abstract
Speech is the most basic and widely used communication method for expressing thoughts during human-human interaction and has been studied for user-friendly interfaces between humans and machines. Recent progress in speech synthesis has produced artificial vocal results with very high intelligibility, but the quality of sound and the naturalness of inflection remain major issues. Today, in addition to the need for improvement in sound quality and naturalness, there is a growing need for a method for the generation of speech with emotions to provide the required information in a natural and effective way. For this purpose, various types of emotional expression are usually transcribed first into corresponding datasets, which are then used for the modeling of each type of emotional speech. This kind of massive dataset analysis technique has improved the performance of information providing services both quantitatively and qualitatively. In this dissertation, however, I argue that this approach does not work well with interactions that are based on personal experience such as emotional speech synthesis. We know empirically that individual speakers have their own ways of expressing emotions based on their personal experience, and that massive dataset management may easily overlook these personalized and relative differences. Therefore, this dissertation examines the emotional prosody structures of four basic emotions such as anger, fear, happiness, and sadness, by considering their personalized and relative differences. As a result, this dissertation addresses the tendency for the emotional prosody structures of pitch and speech rate to depend more on individual speakers (i.e. personal information) than intensity and pause length do. This personal information enables the modeling of relative differences of each emotional prosody structure (i.e. personal prosody model), the possibilities of which were dismissed earlier during the application of massive dataset analysis technique. Based on the personal prosody model, we develop a Korean emotional speech synthesis system that can add emotional information to spoken expressions. In order to convert input sentence into speech, we used a commercial Korean TTS system with a female voice. The evaluation results show that we can successfully incorporate this personal information into an emotional prosody synthesis system, which enhances the recent progress in the recognition rate for happiness and other emotions. We have achieved 48.5% of the recognition rate for happiness among four emotions, which used to be close to the chance level. And from a series of repeated perception tests supported by enough prior training experience, the average recognition rate has improved up to 95.5% for all emotions. We also show the applicability of the proposed Korean emotional speech synthesis system with the implementation of a speech interface of assistive robots designed for the elderly that can modify its prosodic structure according to sentence types and emotional states.

Sentence Type Identification in Korean: Applications to Korean-Sign Language Translation and Korean Speech Synthesis

Jin-Woo Chung, Ho-Joon Lee, and Jong C. Park
Journal of the HCI Society of Korea, Vol. 5, No. 1, pp. 25-35, 2010.
(selected as best paper)
Show abstract
This paper proposes a method of automatically identifying sentence types in Korean and improving naturalness in sign language generation and speech synthesis using the identified sentence type information. In Korean, sentences are usually categorized into five types: declarative, imperative, propositive, interrogative, and exclamatory. However, it is also known that these types are quite ambiguous to identify in dialogues. In this paper, we present additional morphological and syntactic clues for the sentence type and propose a rule-based procedure for identifying the sentence type using these clues. The experimental results show that our method gives a reasonable performance. We also describe how the sentence type is used to generate non-manual signals in Korean-Korean sign language translation and appropriate intonation in Korean speech synthesis. Since the method of using sentence type information in speech synthesis and sign language generation is not much studied previously, it is anticipated that our method will contribute to research on generating more natural speech and sign language expressions.

Automatic Sign Language Generation Reflecting the Relationship between Entities

SangYoon Jung
MS thesis, KAIST, 2010.

Wrestling with Biomedical Research Results: Language Resources and Literature Analysis

D. Rebholz-Schuhmann, Nigel Collier, Jong C. Park, and Limsoon Wong
Journal of Bioinformatics and Computational Biology (JBCB), Vol. 8, No. 1, pp. 129-130, Imperial College Press, February 2010.

Intonation Generation for Korean Speech Synthesis with Automated Sentence Type Classification

Jin-Woo Chung, Ho-Joon Lee, and Jong C. Park
21th HCI Conference Korea, Phoenix Park, January 27-29, 2010.
Show abstract
μŒμ„±μ€ 인간과 인간 μ‚¬μ΄μ˜ μƒν˜Έ μž‘μš©μ—μ„œ κ°€μž₯ 기본적인 정보 전달 방식이며 졜근 λ“€μ–΄ λ‘œλ΄‡μ„ ν¬ν•¨ν•œ 인간과 기계 μ‚¬μ΄μ˜ μžμ—°μŠ€λŸ¬μš΄ μƒν˜Έμž‘μš©μ„ μœ„ν•œ 효과적인 μˆ˜λ‹¨μœΌλ‘œλ„ 널리 ν™œμš©λ˜κ³  μžˆλ‹€. μŒμ„±μ€ 문자 ν˜•νƒœμ˜ μ–Έμ–΄ ν‘œν˜„μ΄ μ†Œλ¦¬ μ •λ³΄λ‘œ λ³€ν™˜λœ κ²ƒμœΌλ‘œ μ–΅μ–‘ 정보λ₯Ό ν¬ν•¨ν•˜κ³  μžˆλŠ”λ°, μ΄λŸ¬ν•œ μ–΅μ–‘ 정보가 적절히 ν‘œν˜„λ˜μ§€ λͺ»ν•œλ‹€λ©΄ λ¬Έμžκ°€ μ§€λ‹Œ μ •λ³΄λ§ˆμ € μ˜¨μ „ν•˜κ²Œ μ „λ‹¬ν•˜κΈ° μ–΄λ €μš°λ―€λ‘œ 상황에 λ§žλŠ” μ–΅μ–‘ 정보λ₯Ό ν‘œν˜„ν•˜λŠ” 것은 맀우 μ€‘μš”ν•˜λ‹€. ν•œκ΅­μ–΄ μŒμ„±μ—μ„œ λ¬Έμž₯의 전체적인 얡양은 κ·Έ λ¬Έμž₯의 μœ ν˜•μ— 따라 λ‹€λ₯΄κ²Œ λ‚˜νƒ€λ‚˜λ―€λ‘œ, μžμ—°μŠ€λŸ¬μš΄ μŒμ„± 합성을 μœ„ν•΄μ„œλŠ” λ¬Έμž₯의 μœ ν˜•μ„ 잘 νŒŒμ•…ν•΄μ•Ό ν•œλ‹€. 이에 λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν•œκ΅­μ–΄ λ¬Έμž₯의 μœ ν˜•μ„ μžλ™μœΌλ‘œ λΆ„λ₯˜ν•˜λŠ” λ¬Έν˜• λΆ„λ₯˜ μ‹œμŠ€ν…œμ„ μ œμ•ˆν•˜κ³ , μ΄λ ‡κ²Œ λΆ„λ₯˜λœ λ¬Έμž₯ μœ ν˜•μ— λ§žλŠ” μ–΅μ–‘ 정보λ₯Ό μƒμ„±ν•˜μ—¬ μžμ—°μŠ€λŸ¬μš΄ μŒμ„± ν‘œν˜„μ„ ν•  수 μžˆλŠ” μŒμ„± ν•©μ„± μ‹œμŠ€ν…œμ„ μ œμ•ˆν•˜κ³ μž ν•œλ‹€.

Extracting Melodies from Piano Solo Music Based on Characteristics of Music

Yoonjae Choi and Jong C. Park
Journal of the Korean Institute of Information Scientists and Engineers (KIISE): Computing Practices and Letters, Vol. 15, No. 12, pp. 923-927, 2009.
Show abstract
The recent growth of a digital music market induces increasing demands for music searching and recommendation services. In order to improve the performance of music-based application services, the process of extracting melodies from polyphonic music is essential. In this paper, we propose a method to extract melodies from piano solo music which is highly polyphonic and has a wide pitch range. We categorize piano music into three classes taking into account the characteristics of music, and extract melodies according to each class. The performance evaluation for the implemented system showed that our method works successfully on a variety of piano solo music.

Automated Classification of Sentential Types in Korean with Morphological Analysis

Jin-Woo Chung and Jong C. Park
Language and Information, Vol. 13, No. 2, pp. 59-97, 2009.
Show abstract
The type of a given sentence indicates the speaker's attitude towards the listener and is usually determined by its final endings and punctuation marks. However, some final endings are used in several types of sentences, which means that we cannot identify the sentential type by considering only the final endings and punctuation marks. In this paper, we propose methods of finding some other linguistic clues for identifying the sentential type with a morphological analysis. We also propose to use these methods to implement a system that automatically classfi es sentences in Korean according to their sentential types.

Automatic Extraction of the Usage Information from the Component Words in Gene Ontology Terms to Enhance Consistency and Predictability

Seung-Cheol Baek and Jong C. Park
3rd International Symposium on Languages in Biology and Medicine (LBM 2009), long paper, Seogwipo, Korea, November 8-10, 2009.
Show abstract
The Gene Ontology (GO) is a controlled vocabulary that has gone through constant changes, motivated primarily by the need to reflect the dynamic nature of knowledge it addresses and the need for usability improvement. A good policy on such changes would be to maintain consistency across terms and structures so as to highlight the missing parts that are likely to be added afterwards, or the unchanged parts to which a policy on usability improvement might not have yet applied. In particular, we argue that the component words inside terms must be used consistently across terms, in order to enhance the predictability of such terms, thus their usability as well. For this purpose, we propose a representation for word usage and a method for extracting it from GO and show its utility in identifying the direction of future changes readily as well as in enhancing the consistency of terms.

Synchronization of Manual and Non-Manual Signals in Automatic Generation of Sign Language Expressions

SangYoon Jung, Eunyoung Chang, and Jong C. Park
Proceedings of the 21th Annual Conference on Human and Cognitive Language Technology (HCLT 2009), pp. 81-86, October. 2009.
(selected as best paper)
Show abstract
λΉ„μˆ˜μ§€ μ‹ ν˜ΈλŠ” μˆ˜ν™”λ₯Ό ν†΅ν•œ μ˜μ‚¬μ†Œν†΅μ„ ν•˜λŠ” 과정에 μˆ˜μ§€ μ‹ ν˜Έ λͺ»μ§€μ•Šκ²Œ μ€‘μš”ν•œ 정보λ₯Ό μ œκ³΅ν•œλ‹€. κ·ΈλŸΌμ—λ„ λΆˆκ΅¬ν•˜κ³  λΉ„μˆ˜μ§€ μ‹ ν˜Έμ— λŒ€ν•œ μ—°κ΅¬λŠ” μˆ˜μ§€ μ‹ ν˜Έμ— λŒ€ν•œ 연ꡬ에 λΉ„ν•΄ μ•„μ§κΉŒμ§€ 맀우 λΆ€μ‘±ν•˜λ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” μ΄λŸ¬ν•œ λΉ„μˆ˜μ§€ μ‹ ν˜Έμ˜ νŠΉμ§•μ— λŒ€ν•΄ λΆ„μ„ν•˜μ˜€λ‹€. λΉ„μˆ˜μ§€ μ‹ ν˜Έλ₯Ό μˆ˜μ§€ μ‹ ν˜Έμ™€ ν•¨κ»˜ μž¬ν˜„ν•˜λŠ” κ³Όμ •μ—λŠ” 정확도 λ¬Έμ œμ™€ 동기화 λ¬Έμ œκ°€ μžˆλŠ”λ°, λ³Έ μ—°κ΅¬μ—μ„œλŠ” 동기화 문제λ₯Ό ν•΄κ²°ν•˜λŠ” μ‹œμŠ€ν…œμ„ μ œμ•ˆν•œλ‹€. κ΅¬ν˜„λœ μ‹œμŠ€ν…œμ€ μž…λ ₯된 λ¬Έμž₯을 ꡬ문 λΆ„μ„ν•˜μ—¬ μˆ˜μ§€ μ‹ ν˜Έμ™€ λΉ„μˆ˜μ§€ μ‹ ν˜Έλ₯Ό κ²°μ •ν•˜λŠ” λΆ€λΆ„κ³Ό ꡬ문 λΆ„μ„λœ κ²°κ³Όλ₯Ό λ°”νƒ•μœΌλ‘œ μˆ˜ν™” μ• λ‹ˆλ©”μ΄μ…˜μ„ μœ„ν•œ μ•‘μ…˜ 슀크립트λ₯Ό μƒμ„±ν•˜λŠ” λΆ€λΆ„μœΌλ‘œ λ‚˜λ‰œλ‹€. μˆ˜μ§€ μ‹ ν˜Έμ™€ λΉ„μˆ˜μ§€ μ‹ ν˜Έμ˜ μ—°κ²° μˆœμ„œμ™€ 방식에 따라 μˆ˜ν™”μ˜ 뜻이 λ‹¬λΌμ§ˆ 수 있기 λ•Œλ¬Έμ—, λ³Έ μ—°κ΅¬μ—μ„œ λ‹€λ£¨λŠ” λΉ„μˆ˜μ§€ μ‹ ν˜Έμ˜ 동기화 λ¬Έμ œλŠ” μˆ˜ν™” μžλ™ 생성에 μžˆμ–΄μ„œ 맀우 μ€‘μš”ν•œ λ¬Έμ œμ΄λ‹€.

Extracting Melodies from Piano Music Based on Characteristics of Music

Yoonjae Choi
MS thesis, KAIST, 2009.

Toward finer-grained sentiment identification in product reviews through linguistic and ontological analyses

Hye-Jin Min and Jong C. Park
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 169-172, Singapore, August 2-7, 2009.
Show abstract
We propose categories of finer-grained polarity for a more effective aspect-based sentiment summary, and describe linguistic and ontological clues that may affect such fine-grained polarity. We argue that relevance for satisfaction, contrastive weight clues, and certain adverbials work to affect the polarity, as evidenced by the statistical analysis.

Text and Sign Language Animation with Combinatory Categorial Grammar

Jong C. Park
Invited talk, Institute of Communicating and Collaborative Systems (ICCS), University of Edinburgh, UK, July 3, 2009.

A Text Mining Tool for Ubiquitin-Protein Ligases

Jong C. Park
Invited talk, Centre for Systems Biology, University of Edinburgh, UK, 8 July, 2009.

Interpretation of User Evaluation for Emotional Speech Synthesis System

Ho-Joon Lee and Jong C. Park
13th International Conference on Human-Computer Interaction (HCII 2009), San Diego, USA, July 19-24, 2009.
Show abstract
Whether it is for human-robot interaction or for human-computer interaction, there is a growing need for an emotional speech synthesis system that can provide the required information in a more natural and effective manner. In order to identify and understand the characteristics of basic emotions and their effects, we propose a series of user evaluation experiments on an emotional prosody modification system that can express either perceivable or slightly exaggerated emotions classified into anger, joy, and sadness as an independent module for a general purpose speech synthesis system. In this paper, we propose two experiments to evaluate the emotional prosody modification module according to different types of the initial input speech. And we also provide a supplementary experiment to understand the apparently prosody-independent emotion, or joy, by replacing the resynthesized joy speech information with original human voice recorded in the emotional state of joy.

Analysis and Computational Processing of Homonyms in Korean for Automatic Sign Language Generation

SangYoon Jung, Eunyoung Chang, and Jong C. Park
Proceedings of the Korea Computer Congress (KCC 2009), Vol. 36, No. 1(C), pp. 315-320, Jeju, July 1-3, 2009.
Show abstract
ν•œκ΅­μ–΄λ₯Ό μˆ˜ν™”λ‘œ μžλ™ μƒμ„±ν•˜λŠ” λŒ€λΆ€λΆ„μ˜ μ—°κ΅¬μ—μ„œλŠ” ν•œκ΅­μ–΄κ°€ λ‚˜νƒ€λ‚Ό 수 μžˆλŠ” 각 κ°œλ…μ— μ–΄μšΈλ¦¬λŠ” μˆ˜ν™”λ™μž‘μ„ 미리 λ§Œλ“€μ–΄ 놓고 이λ₯Ό μžμ—°μŠ€λŸ½κ²Œ μ—°κ²°μ‹œν‚΄μœΌλ‘œμ¨ μˆ˜ν™” ν‘œν˜„μ„ μžλ™ μƒμ„±ν•˜λ € ν•œλ‹€. ν•˜μ§€λ§Œ ν•œκ΅­μ–΄ λ™μŒμ΄μ˜μ–΄μ— λŒ€ν•œ μˆ˜ν™”λ₯Ό μžλ™μœΌλ‘œ μƒμ„±ν•˜λŠ” κ²½μš°μ— 이와 같은 λ°©λ²•μœΌλ‘œλŠ” ν•œκ³„κ°€ μžˆλ‹€. μ΄λŠ” 건청인이 μƒκ°ν•˜λŠ” ν•˜λ‚˜μ˜ κ°œλ…μ΄ 농인이 μ‚¬μš©ν•˜λŠ” μˆ˜ν™”μ—μ„œλŠ” μ—¬λŸ¬ 가지 λ‹€λ₯Έ ν˜•νƒœλ‘œ ν‘œν˜„λ  수 있기 λ•Œλ¬Έμ΄λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 이와 같이 건청인듀 μ‚¬μ΄μ—μ„œ ν•˜λ‚˜μ˜ κ°œλ…μœΌλ‘œ μ‚¬μš©λ˜λŠ” 단어λ₯Ό 농인듀이 μ—¬λŸ¬ λ‹€λ₯Έ ν˜•νƒœλ‘œ ν‘œν˜„ν•˜λŠ” 경우 기쑴의 μˆ˜ν™” μžλ™ μƒμ„±λ°©λ²•μœΌλ‘œλŠ” ν•œκ³„κ°€ μžˆλ‹€λŠ” 점을 λ³΄μ™„ν•œ μˆ˜ν™” μžλ™ 생성 μ‹œμŠ€ν…œμ„ μ œμ•ˆν•œλ‹€.

Extracting Melodies from Piano Solo Music Based on Characteristics of Music

Yoonjae Choi and Jong C. Park
Proceedings of the Korea Computer Congress (KCC 2009), Vol. 36, No. 1(A), pp. 124-125, Jeju, July 1-3, 2009.
(selected as best paper)
Show abstract
μΈν„°λ„·μ˜ λ°œλ‹¬λ‘œ λ©€ν‹°λ―Έλ””μ–΄ 자료의 검색 및 ν™œμš© 방법에 λŒ€ν•œ 연ꡬ가 ν™œλ°œνžˆ μ§„ν–‰λ˜κ³  μžˆλ‹€. 특히 디지털 음반 μ‹œμž₯의 λΉ λ₯Έ λ°œμ „μœΌλ‘œ 인해 μŒμ•… 검색 및 μΆ”μ²œμ— λŒ€ν•œ μˆ˜μš”κ°€ κ³„μ†ν•΄μ„œ μ¦κ°€ν•˜κ³  μžˆλŠ”λ° μ΄λŸ¬ν•œ μ„œλΉ„μŠ€λ₯Ό μˆ˜ν–‰ν•˜λŠ” μŒμ•… 기반 μ‘μš© μ‹œμŠ€ν…œμ˜ μ„±λŠ₯ ν–₯상을 μœ„ν•΄μ„œλŠ” 일반적인 μŒμ•…μ˜ ν˜•νƒœμΈ λ‹€μŒ(Polyphonic) μŒμ•…μ—μ„œ λ©œλ‘œλ””λ₯Ό μΆ”μΆœν•˜λŠ” 과정이 ν•„μˆ˜μ μ΄λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” λ‹€μŒμ˜ λ³΅μž‘λ„κ°€ λ†’κ³  넓은 μŒμ—­μ„ κ°€μ§€λŠ” μŒμ•…μ„ λ§Œλ“€ 수 μžˆλŠ” ν”Όμ•„λ…Έ μ†”λ‘œ μŒμ•…μ—μ„œ λ©œλ‘œλ””λ₯Ό μΆ”μΆœν•˜λŠ” 방법을 μ œμ•ˆν•œλ‹€.

Extracting Melodies from Polyphonic Piano Solo Music Based on Patterns of Music Structure

Yoonjae Choi, Ho-Joon Lee, Hodong Lee, and Jong C. Park
Proceedings of the 20th Human Computer Interaction (HCI 2009), pp. 725-732, Phoenix Park, Feb 9-11, 2009.
Show abstract
Thanks to the development of the Internet, people can easily access a vast amount of music. This brings attention to application systems such as a melody-based music search service or music recommendation service. Extracting melodies from music is a crucial process to provide such services. This paper introduces a novel algorithm that can extract melodies from piano music. Since piano can produce polyphonic music, we expect that by studying melody extraction from piano music, we can help extract melodies from general polyphonic music.

Function-focused Gene Clustering by Utilizing Granularities of Gene Functions

Tak-eun Kim
MS thesis, KAIST, 2009.

Automatic Identification of the Relation between Dependency Relations and Definitions of GO Concepts

Seung-Cheol Baek
MS thesis, KAIST, 2009.

Analysis and Use of Intonation Features for Emotional States

Ho-Joon Lee and Jong C. Park
Proceedings of the 20th Annual Conference on Human and Cognitive Language Technology, pp. 144-149, October 11-12, 2008.
Show abstract
λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 8개의 λ¬Έμž₯에 λŒ€ν•΄μ„œ 6λͺ…μ˜ ν™”μžκ°€ 5가지 감정 μƒνƒœλ‘œ λ°œν™”ν•œ 총 240개의 λ¬Έμž₯을 감정 μŒμ„± λ§λ­‰μΉ˜λ‘œ ν™œμš©ν•˜μ—¬ 각 감정 μƒνƒœμ—μ„œ νŠΉμ§•μ μœΌλ‘œ λ‚˜νƒ€λ‚˜λŠ” μ–΅μ–‘ νŒ¨ν„΄μ„ λΆ„μ„ν•˜κ³ , μ΄λŸ¬ν•œ μ–΅μ–‘ νŒ¨ν„΄μ„ μŒμ„± ν•©μ„± μ‹œμŠ€ν…œμ— μ μš©ν•˜λŠ” 방법에 λŒ€ν•΄μ„œ λ…Όμ˜ν•œλ‹€. 이λ₯Ό μœ„ν•΄ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 감정 μƒνƒœμ— λ”°λ₯Έ νŠΉμ§•μ  μ–΅μ–‘ νŒ¨ν„΄μ„ μ–΅μ–‘κ΅¬μ˜ 길이, μ–΅μ–‘κ΅¬μ˜ ꡬ말 경계 μ„±μ‘°, ν•˜κ°• ν˜„μƒμ— 쀑점을 두어 λΆ„μ„ν•˜κ³ , 기쁨, μŠ¬ν””, 화남 곡포의 감정을 ꡬ뢄 지을 수 μžˆλŠ” μ–΅μ–‘ νŠΉμ§•λ“€μ„ ν•©μ„± μ‹œμŠ€ν…œμ— μ μš©ν•˜λŠ” 과정을 보인닀. λ³Έ 연ꡬλ₯Ό 톡해 ν™”λ‚¨μ˜ κ°μ •μ—μ„œ λ‚˜νƒ€λ‚˜λŠ” μ–΅μ–‘μ˜ μƒμŠΉ ν˜„μƒμ„ 확인할 수 μžˆμ—ˆκ³ , 각 감정에 λ”°λ₯Έ νŠΉμ§•μ  μ–΅μ–‘ νŒ¨ν„΄μ„ 찾을 수 μžˆμ—ˆλ‹€.

Towards Knowledge Discovery through Automatic Inference with Text Mining in Biology and Medicine

Hee-Jin Lee and Jong C. Park
3rd International Symposium on Semantic Mining in Biomedicine (SMBM), Turku, Finland, September 1-3, 2008.
Show abstract
Field experts in biology and medicine search the literature for state-of-the-art results and occasionally discover knowledge through manual inference on published causal relations. However, the results of such inference cannot be sufficiently accurate and/or complete, as the domain of published relations is rather huge. In this paper, we introduce an automatic inference system, BioDetective, which works on literature-mined qualitative causal information in biology and medicine. BioDetective provides proofs for such qualitative causal information, and predicts the existence of new causal information, if there is any. The system is tested with a case study, where literature-mined information about protein regulation is utilized to come up with new knowledge.

Computational Processing of Verb Agreement for Automatic Generation of Sign Language Animation

Sangha Kim
MS thesis, KAIST, 2008.

An effective way to learn biological knowledge with linguistic resources

Jin-Bok Lee, Tak-eun Kim, and Jong C. Park
18th International Congress of Linguists (CIL 18), Seoul, Korea, July 21-26, 2008.
Show abstract
The most general and effective way for people to acquire desired knowledge is to learn from tutors with face-to-face contact. The tutors can pick out important pieces of information and deliver them systematically to the learners considering their specialties, interests, rates of progress, and so on. However, since all learners may not be taught by tutors during their convenient time, the field of e-learning or distance learning has been emerged.
To maintain the benefits of face-to-face learning in an automatic way, the challenge remains in equipping computers with the expertise, skills and modes of actions of the human tutor, overcoming spatial, temporal, ocio-economical and environmental restrictions. In order to overcome these challenges, we focus on two issues: (1) information investigation: how to pick out essential pieces of information that do not include overlapping or obsolete pieces, and (2) information delivery: how to deliver the selected ones to learners effectively in point of understanding and memorization.
In this paper, we propose a web-based smart tutoring system for helping biology-major student to learn genes. To incorporate the two issues described above into our tutoring system, we extensively use linguistic resources in the biology domain, such as Gene Ontology or UMLS, for selecting and classifying information from huge amount of data. We believe that our tutoring system can autonomously carry out almost all the functionalities of human tutor including investigation, delivery, and adaptation of learner’s feedbacks.

Syntactic Construction of Coordination in Sign Language Generation

Hodong Lee, Sangha Kim, and Jong C. Park
18th International Congress of Linguists (CIL 18), Seoul, Korea, July 21-26, 2008.
Show abstract
Coordination in sign languages is an essential construction to describe more than one kind of information, as used in natural languages. Although it may appear to follow general rules of coordination, its realization with multi-channel motions is often quite different from that in natural languages, due to the differences at levels of syntax and semantics. A multi-channel motion is simultaneously composed of shape, position, orientation and movement of the hands, arms, body, or face. In this paper, we address the problems in converting coordination-bearing sentences into their matching motions in sign languages. In particular, we focus on the issues between the Korean language and the Korean sign language (KSL).

E3Miner: a text mining tool for ubiquitin-protein ligases

Hodong Lee, Gwansu Yi, and Jong C. Park
Nucleic Acids Research, Vol. 36, Web Server issue Published doi:10.1093/nar/gkn286, 15 May 2008 (SCI IF 8.026).
Show abstract
Ubiquitination is a regulatory process critically involved in the degradation of >80% of cellular proteins, where such proteins are specifically recognized by a key enzyme, or a ubiquitin-protein ligase (E3). Because of this important role of E3s, a rapidly growing body of the published literature in biology and biomedical fields reports novel findings about various E3s and their molecular mechanisms. However, such findings are neither adequately retrieved by general text-mining tools nor systematically made available by such protein databases as UniProt alone. E3Miner is a web-based text mining tool that extracts and organizes comprehensive knowledge about E3s from the abstracts of journal articles and the relevant databases, supporting users to have a good grasp of E3s and their related information easily from the available text. The tool analyzes text sentences to identify protein names for E3s, to narrow down target substrates and other ubiquitin-transferring proteins in E3-specific ubiquitination pathways and to extract molecular features of E3s during ubiquitination. E3Miner also retrieves E3 data about protein functions, other E3-interacting partners and E3-related human diseases from the protein databases, in order to help facilitate further investigation. E3Miner is freely available through http://e3miner.biopathway.org.

Monitoring the evolutionary aspect of the Gene Ontology to enhance predictability and usability

Jong C. Park, Tak-eun Kim, and Jinah Park
BMC Bioinformatics 2008, 9(Suppl 3):7 doi:10.1186/1471-2105-9-S3-S7, 11 April 2008.
Show abstract
Background: Much effort is currently made to develop the Gene Ontology (GO). Due to the dynamic nature of information it addresses, GO undergoes constant updates whose results are released at regular intervals as separate versions. Although there are a large number of computational tools to aid the development of GO, they are operating on a particular version of GO, making it difficult for GO curators to anticipate the full impact of particular changes along the time axis on a larger scale. We present a method for tapping into such an evolutionary aspect of GO, by making it possible to keep track of important temporal changes to any of the terms and relations of GO and by consequently making it possible to recognize associated trends.
Results: We have developed visualization methods for viewing the changes between two different versions of GO by constructing a colour-coded layered graph. The graph shows both versions of GO with highlights to those GO terms that are added, removed and modified between the two versions. Focusing on a specific GO term or terms of interest over a period, we demonstrate the utility of our system that can be used to make useful hypotheses about the cause of the evolution and to provide new insights into more complex changes.
Conclusions: GO undergoes fast evolutionary changes. A snapshot of GO, as presented by each version of GO alone, overlooks such evolutionary aspects, and consequently limits the utilities of GO. The method that highlights the differences of consecutive versions or two different versions of an evolving ontology with colour-coding enhances the utility of GO for users as well as for developers. To the best of our knowledge, this is the first proposal to visualize the evolutionary aspect of GO.

Text Mining and Management Tools for Resource Construction and Validation in the Life Sciences

Jong C. Park
Dagstuhl Seminar on Text Mining and Ontologies for Life Sciences, Schloss Dagstuhl, Wadern, Germany, March, 2008.

Sign Language Generation with Animation by Adverbial Phrase Analysis

Sangha Kim and Jong C. Park
17th Human Computer Interaction (HCI 2008), Phoenix Park, Feb 13-15, 2008.
(selected as best paper)
Show abstract
Sign languages, commonly used in aurally challenged communities, are a kind of visual language expressing sign words with motion, Spatiality and motility of a sign language are conveyed mainly via sign words as predicates. A predicate is modified by an adverbial phrase with an accompanying change in its semantics so that the adverbial phrase can also affect the overall spatiality and motility of expressions of a sign language. In this paper, we analyze the semantic features of adverbial phrases which may affect the motion-related semantics of a predicate in converting expressions in Korean into those in a sign language and propose a system that generates corresponding animation by utilizing these features.

On the Automatic Generation of Illustrations for Events in Storybooks: Representation of Illustrative Events

Seung-Cheol Baek, Hee-Jin Lee, and Jong C. Park
17th Human Computer Interaction (HCI 2008), Phoenix Park, Feb 13-15, 2008.
Show abstract
Storybooks, especially those for children, may contain illustrations. An automated system for generating illustrations would help the production process of storybook publishing. In this paper, we propose a method for automatically generating layouts of objects during generating illustrations. In generated layouts, it is preferred to avoid unnecessary overlap between objects, corresponding to the spatial information in storybooks. We first define a representation scheme for spatial information in natural language sentences using tree structures and predicate-argument structures. Unification of tree structures and Region Connection Calculus are then used to manipulate the information and generate corresponding illustrations.

Visualizing the Temporal Distribution of Terminologies for Biological Ontology Development

Tak-eun Kim, Hodong Lee, Jinah Park, and Jong C. Park
International Conference on Visualization and Data Analysis (VDA), San Jose, USA, 26-31 January, 2008.
Show abstract
Communities in biology have developed a number of ontologies that provide standard terminologies for the characteristics of various concepts and their relationships. However, it is difficult to construct and maintain such ontologies in biology, since it is a non-trivial task to identify commonly used potential member terms in a particular ontology, in the presence of constant changes of such terms over time as the research in the field advances. In this paper, we propose a visualization system, called BioTermViz, which presents the temporal distribution of ontological terms from the text of published journal abstracts. BioTermViz shows such a temporal distribution of terms for journal abstracts in the order of published time, occurrences of the annotated Gene Ontology concepts per abstract, and the ontological hierarchy of the terms. With a combination of these three types of information, we can capture the global tendency in the use of terms, and identify a particular term or terms to be created, modified, segmented, or removed, effectively developing biological ontologies in an interactive manner. In order to demonstrate the practical utility of BioTermViz, we describe several scenarios for the development of an ontology for a specific sub-class of proteins, or ubiquitin-protein ligases.

Interpretation of Natural Language Queries for Effective Data Exploration over Heterogeneous Databases: Applications to Biomedical Domain

Hodong Lee
PhD dissertation, KAIST, 2008.
Show abstract
Data exploration is an essential process for discovering novel knowledge in scientific researches. However, it is difficult for field experts to find out the target data only by exploration, especially when the data are scattered over multiple and heterogeneous databases. Since such data are usually associated with one another, there may be appropriate sequences of searches that the field experts can use for queries to reach the target data. In order to help such data exploration, conventional database interfaces provide useful tools for querying in keywords or structured forms. However, we argue that they are inadequate to express the queries for sequences of searches in multiple databases which embody diverse relations among their data. In order to describe such queries in a convenient and expressive manner, we propose to use natural language queries (NLQs) to interact with the databases. Such a database interface shall automatically interpret NLQs into formal language queries (FLQs) that are in turn composed of small FLQs for different databases. This task requires us to address the problem of database heterogeneity due to the differences in formal query languages, database structures, and data contents. The dissertation addresses this problem by considering NLQs as terms and syntactic relations, which respectively correspond to data objects and their operations. We utilize SQL-like expressions to coordinate such terms and syntactic relations, resulting in FLQs via a straightforward mapping process. In this work, we present a method that derives the SQL-like expressions from NLQs in a Combinatory Categorial Grammar (CCG) framework, and then translates the expressions into the locations of data objects accessible from our target databases. The method then constructs FLQs for such locations in possible sequences with accounts for data associations. Our method thus provides a fully automated way to locate and retrieve available data from databases. We also show that our method works as a useful interface serving data exploration and integration, which help the experts to discover knowledge from heterogeneous databases. As practical examples, we illustrate biomedical applications: protein-seeking for data exploration, a ubiquitin-protein ligase (E3) database for data integration, and an E3 data mining tool for further data integration.

Analysis of Indirect Uses of Interrogative Sentences Carrying Anger

Hye-Jin Min and Jong C. Park
PACLIC 21, Seoul National University, November 1-3, 2007.
Show abstract
Interrogative sentences are generally used to perform speech acts of directly asking a question or making a request, but they are also used to convey such speech acts indirectly. In the utterances, such indirect uses of interrogative sentences usually carry speaker’s emotion with a negative attitude, which is close to an expression of anger. The identification of such negative emotion is known as a difficult problem that requires relevant information in syntax, semantics, discourse, pragmatics, and speech signals. In this paper, we argue that the interrogatives used for indirect speech acts could serve as a dominant marker for identifying the emotional attitudes, such as anger, as compared to other emotion-related markers, such as discourse markers, adverbial words, and syntactic markers. To support such an argument, we analyze the dialogues collected from the Korean soap operas, and examine individual or cooperative influences of the emotion-related markers on emotional realization. The user study shows that the interrogatives could be utilized as a promising device for emotion identification.

On the Automatic Generation of Illustrations for Events in Storybooks

Seung-Cheol Baek, Eunyoung Chang, and Jong C. Park
KIISE 2007 Fall Conference, Pusan National University, October 26-27, 2007.
Show abstract
문학가와 μΌλ°˜μΈλ“€ μ‚¬μ΄μ˜ 경계가 인터넷 μ†Œμ„€ λ“±μœΌλ‘œ 희미해지고 μžˆλ‹€. 어린이λ₯Ό λ…μžλ‘œ κ²°μ •ν•˜κ³  μž‘ ν’ˆμ„ μ°½μž‘ν•˜λŠ” μ‚¬λžŒλ“€μ€ μ‚½ν™”λ₯Ό κ·Έλ €μ„œ μž‘ν’ˆμ„ μΆœνŒν•˜κ³  μ‹Άμ–΄ν•œλ‹€. λ³Έ 논문은 μ‚¬μš©μžκ°€ λ™ν™”μ˜ νŠΉμ • 사 건을 주제둜 μ‚½ν™”λ₯Ό μƒμ„±ν•˜κ³ μž ν•  λ•Œ 이λ₯Ό μžλ™μœΌλ‘œ μƒμ„±ν•˜λŠ” 방법에 λŒ€ν•˜μ—¬ λ…Όμ˜ν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 특히 λ¬Έμž₯λ“€μ˜ κ²°ν•©μœΌλ‘œ ν‘œν˜„λ˜λŠ” ν•˜λ‚˜μ˜ 사건을 μ‚½ν™”λ‘œ κ·Έλ¦¬λŠ” 방법을 μ œμ•ˆν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μžμ—°μ–Έ μ–΄λ₯Ό ν•΄μ„ν•˜μ—¬ 사건을 μΆ”μΆœν•˜λŠ” λ°©λ²•μœΌλ‘œ κ²°ν•© λ²”μ£Ό 문법을 μ‚¬μš©ν•œλ‹€.

Translating a Complex Sentence in Korean into a Sign Language Script for an Automatic Sign Language Generation

Sangha Kim, Eunyoung Chang, and Jong C. Park
the 19th Annual Conference on Human and Cognitive Language Technology (KLIP 2007), Kyungpook National University, October 12-13, 2007.

Characteristics of Spoken Discourse Markers and their Application to Speech Synthesis Systems

Ho-Joon Lee and Jong C. Park
the 19th Annual Conference on Human and Cognitive Language Technology (KLIP 2007), Kyungpook National University, October 12-13, 2007.

Customized Message Generation and Speech Synthesis in Response to the Characteristic Behavioral Patterns of Children

Ho-Joon Lee and Jong C. Park
HCI International, Beijing, P. R. China, July 22-27, 2007.
Show abstract
There is a growing need for a user-friendly human-computer interaction system that can respond to various characteristics of a user in terms of behavioral patterns, mental state, and personalities. In this paper, we present a system that generates appropriate natural language spoken messages with customization for user characteristics, taking into account the fact that human behavioral patterns usually reveal one’s mental state or personality subconsciously. The system is targeted at handling various situations for five-year old kindergarteners by giving them caring words during their everyday lives. With the analysis of each case study, we provide a setting for a computational method to identify user behavioral patterns. We believe that the proposed link between the behavioral patterns and the mental state of a human user can be applied to improve not only user interactivity but also believability of the system.

Accessing and Managing Massive Information Resources using Natural Language Processing

Jong C. Park
Invited talk, KISTI, Daejeon, Korea, May, 2007.

Natural Language Processing and Combinatory Categorial Grammar

Jong C. Park
Invited talk, Korea Institute of Science and Technology (KIST), Seoul, Korea, April, 2007.

Combinatory Categorial Grammar: Fundamental Issues in the State-of-the-Art

Jong C. Park
Korean Society for Language and Information (KSLI), a tutorial presentation at a monthly meeting, Seoul, Korea, April, 2007.

Creating Biomedical Resources with NLP-based Information Extraction

Jong C. Park
Invited talk, Tokyo Forum on Advanced NLP and TM (T-FaNT 07), Tokyo, Japan, March 11-13, 2007.

Representing Emotions with Linguistic Acuity

Hye-Jin Min and Jong C. Park
Conference on Intelligent Text Processing and Computational Linguistics (CICLing), Mexico City, Mexico, February 18-24, 2007.
Show abstract
For a robot to make effective and friendly interaction with human users, it is important to keep track of emotional changes in utterance properly. Emotions have traditionally been characterized by intuitive but atomic categories or as points in evaluation-activity dimensions. However, this characterization falls short of capturing subtle emotional changes either in narration or in text, where the vast majority of information is presented with a host of linguistic constructions that convey emotional information. We propose a novel representation scheme for emotions, so that such important features as duration, target and intensity can also be treated as first-class citizens and systematically accounted for. We argue that it is with this new mode of representation that the subtlety of the emotional flow in utterance can be properly addressed. We use this representation to encode the emotional states and intentions of characters in the drama scripts for soap opera and describe how it is utilized in conjunction with parsing for lexicalized grammars.

Identifying Emotional Cues in Dialogue Sentences According to Targets

Hye-Jin Min and Jong C. Park
HCI Conference Korea, Phoenix Park, February, 2007.
Show abstract
일상 μƒν™œμ—μ„œμ˜ λŒ€ν™” λ˜λŠ” 컴퓨터λ₯Ό 맀개둜 μ΄λ£¨μ–΄μ§€λŠ” λŒ€ν™”μ—μ„œ μžκΈ°λ…ΈμΆœμ€ μ„œλ‘œμ— λŒ€ν•œ 개인적인 정보λ₯Ό κ³΅μœ ν•˜μ—¬ μΉœλ°€ν•œ 관계λ₯Ό μœ μ§€ν•˜κΈ° μœ„ν•œ 과정이닀. μžκΈ°λ…ΈμΆœμ—μ„œμ˜ 개인적인 μ •λ³΄λŠ” 생각 및 κ²½ν—˜μ„ λΉ„λ‘―ν•˜μ—¬ 감정 등을 μ˜λ―Έν•˜λŠ”λ°, 감정은 특히 λŒ€ν™” λΆ„μœ„κΈ° ν˜•μ„± 및 μ›ν™œν•œ λŒ€ν™” 진행을 μœ„ν•œ 효과적인 μ˜μ‚¬μ†Œν†΅μˆ˜λ‹¨μœΌλ‘œ μž‘μš©ν•œλ‹€. λŒ€ν™” μ‹œμ˜ κ°μ •λ…ΈμΆœμ€ λŒ€ν™” μƒλŒ€λ°©(λ…ΈμΆœ λŒ€μƒ)κ³Ό κ°μ •ν‘œν˜„μ˜ λŒ€μƒ(ν‘œν˜„ λŒ€μƒ)에 따라 ν‘œν˜„μ˜ μ‹€μ œκ°•λ„μ™€ λ…ΈμΆœμ˜ 정도가 λ‹¬λΌμ§€κ²Œ λœλ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” 인터넷을 톡해 λŒ€ν™”λ₯Ό μ£Όκ³  λ°›κ±°λ‚˜ 자료λ₯Ό 전솑할 수 μžˆλŠ” μΈμŠ€ν„΄νŠΈ λ©”μ‹ μ €λ₯Ό ν†΅ν•˜μ—¬ 이루어진 λŒ€ν™”μ—μ„œ λ…ΈμΆœ λŒ€μƒκ³Ό ν‘œν˜„ λŒ€μƒμ„ κ³ λ €ν•˜μ—¬ λŒ€ν™”μ°Έμ—¬μžμ˜ κ°μ •μƒνƒœλ₯Ό νŒŒμ•…ν•œλ‹€. 이λ₯Ό μœ„ν•œ μ‚¬μ „μ‘°μ‚¬λ‘œ λ“œλΌλ§ˆ 슀크립트 μƒμ˜ λ“±μž₯μΈλ¬Όλ“€μ˜ κ°μ •ν‘œν˜„ νŒ¨ν„΄μ„ λΆ„μ„ν•˜κ³  이λ₯Ό ν™œμš©ν•˜μ—¬ λ…ΈμΆœ λŒ€μƒμ΄ 각각 λ‹€λ₯Έ λŒ€ν™”λ¬Έμž₯μ—μ„œ 톡사 및 의미 뢄석 과정을 거쳐 ν‘œν˜„ λŒ€μƒμ— λ”°λ₯Έ λŒ€ν™”μ°Έμ—¬μžμ˜ κ°μ •μƒνƒœλ₯Ό νŒŒμ•…ν•˜κ³ , λŒ€ν™”μ°Έμ—¬μžκ°€ μžμ‹ μ˜ 감정을 κ΄€μ°°ν•  수 μžˆλŠ” μΈν„°νŽ˜μ΄μŠ€λ₯Ό μ œκ³΅ν•œλ‹€.

Searching Animation Models with a Lexical Ontology for Text Animation

Eunyoung Chang, Hee-Jin Lee, and Jong C. Park
HCI Conference Korea, Phoenix Park, February, 2007.

Customized and Selective Interpretation

Jong C. Park
6th Singapore-Korea Joint Workshop on Bioinformatics and Natural Language Processing, Singapore, February 12, 2007.

Automatic Data Integration of Ubiquitin-protein Ligases

Hodong Lee and Jong C. Park
6th Singapore-Korea Joint Workshop on Bioinformatics and Natural Language Processing, Singapore, February 12, 2007.

Extracting Relational Information for Protein Pairs

Hee-Jin Lee and Jong C. Park
6th Singapore-Korea Joint Workshop on Bioinformatics and Natural Language Processing, Singapore, February 12, 2007.

An Ontology-based Approach to Generation of Gene Summaries

Tak-eun Kim and Jong C. Park
6th Singapore-Korea Joint Workshop on Bioinformatics and Natural Language Processing, Singapore, February 12, 2007.

Audioization over Visualization for Effective Knowledge Discovery

Ho-Joon Lee and Jong C. Park
6th Singapore-Korea Joint Workshop on Bioinformatics and Natural Language Processing, Singapore, February 12, 2007.

Resource-bound Information Animation with a Lexical Ontology

Eunyoung Chang and Jong C. Park
6th Singapore-Korea Joint Workshop on Bioinformatics and Natural Language Processing, Singapore, February 12, 2007.

Text Analysis for Facial Animation of Non-Manual Information in Sign Language Generation

Sangha Kim and Jong C. Park
6th Singapore-Korea Joint Workshop on Bioinformatics and Natural Language Processing, Singapore, February 12, 2007.

Extracting Relational Information for Protein Pairs

Jin-Bok Lee, Tak-eun Kim, Chan-Goo Kang, and Jong C. Park
6th Singapore-Korea Workshop on Bioinformatics and Natural Language Processing, Institute for Infocomm Research, Singapore, February 12, 2007.

Document Similarity Assessment with Natural Language Processing: Applications to Background Music Recommendation for Blog Articles

Doojin Park
MS thesis, KAIST, 2007.

Reducing manual curation in Gene Ontology extension

Jin-Bok Lee and Jong C. Park
5th Korea-Singapore Joint Workshop on Bioinformatics and Natural Language Processing, Daejeon, Korea, November 17, 2006.

Exploring a cascade of databases for cancer-related target identification

Hodong Lee and Jong C. Park
5th Korea-Singapore Joint Workshop on Bioinformatics and Natural Language Processing, Daejeon, Korea, November 17, 2006.

Construction of Emotion-related Pathway for Emotional Disorder Diagnosis - Case Study: Serotonin-related Protein Pathway -

Hye-Jin Min and Jong C. Park
5th Korea-Singapore Joint Workshop on Bioinformatics and Natural Language Processing, Daejeon, Korea, November 17, 2006.

Bidirectional Incremental Approach to Efficient Information Extraction: Applications to Biomedicine

Jung-jae Kim
PhD dissertation, KAIST, 2006.
(Outstanding Ph.D. Dissertation Award, 2006. 8.)
Show abstract
Information extraction refers to the task of extracting relevant information from texts. This dissertation targets at extracting information of relations between biomedical concepts, which are explicitly represented with known linguistic structures in biomedical texts. Such structures of a target relation involve a keyword and its semantic arguments, where the keyword indicates the semantic type of the target relation and the arguments indicate the related concepts. The information of relations thus has two types of locality, such that the information is expressed in the local context of the keyword, called spatial locality, and that the keyword has well-known syntactic relations with its arguments, called structural locality. These two types of locality have been in the past handled by pattern matching and partial parsing approaches, respectively, but not at the same time. In this dissertation, we address this problem with a novel approach that searches for the arguments both bidirectionally and incrementally from the keywords. The extraction process is divided into two steps. First, it uses a non-structured pattern that describes a context between a keyword and its arguments, to match an input sentence bidirectionally from the keyword. It then performs syntactic analysis incrementally on candidate arguments and, if necessary, on their sentential contexts as well, with a parser of a combinatory categorial grammar for rigorous syntactic verification of the candidates. The approach addresses the aforementioned spatial locality by utilizing non-structured patterns and the structural locality by employing a lazy evaluation parser that is customized for information extraction. The approach is highly efficient, evidenced with experimental results, because it can stop the extraction process at any step, when the syntactic analysis gives a negative piece of evidence for extracting relevant information. We also show the applicability of the approach with two different tasks in biomedicine: Biological interactions, which are useful for building up biological pathways, and protein-protein contrastive relations which are useful for refining protein pathways.

Natural Langauge Processing and Bioinformatics

Jong C. Park
Norwegian University of Science and Technology (NTNU), Trondheim, Norway, June, 2006 (invited lecture)

Natural Langauge Processing and Bioinformatics

Jong C. Park
Invited lecture, Norwegian University of Science and Technology (NTNU), Trondheim, Norway, June, 2006.

Customized Emotion Representation for Automatic Generation of Emotionally Appropriate Dialogs

Hye-Jin Min and Jong C. Park
the Korean Society for Emotion & Sensibility, KIST, May, 2006.
Show abstract
λ³Έ μ—°κ΅¬μ—μ„œλŠ” μ‚¬μš©μžμ—κ²Œ μ˜ν™” 정보λ₯Ό μ „λ‹¬ν•˜κ³  μ˜ν™”λ₯Ό μΆ”μ²œν•΄ μ£ΌλŠ” μ‹œμŠ€ν…œμ—μ„œ μ‚¬μš©μžμ™€ μ‹œμŠ€ν…œ κ°„μ˜ λŒ€ν™” λ§λ­‰μΉ˜λ₯Ό λΆ„μ„ν•˜μ—¬ λŒ€ν™”λ¬Έμ— λ‚˜νƒ€λ‚˜λŠ” 보편적 λ˜λŠ” κ°œλ³„μ  감정 정보λ₯Ό μ‹λ³„ν•˜κ³  이듀을 κΈ°μˆ ν•˜λŠ” 방법에 λŒ€ν•˜μ—¬ λ…Όμ˜ν•œλ‹€. 감정을 ν‘œν˜„ν•˜λŠ” μ–Έμ–΄ μ •λ³΄λŠ” μžμ—°μ–Έμ–΄μ²˜λ¦¬ κΈ°μˆ μ„ ν™œμš©ν•˜μ—¬ λŒ€ν™”λ¬ΈμœΌλ‘œλΆ€ν„° μžλ™μœΌλ‘œ μΆ”μΆœλ˜μ–΄ 감정이 ν¬ν•¨λœ λŒ€ν™”λ¬Έ 응닡 생성에 ν™œμš©λœλ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” μžμ—°μ–Έμ–΄μ²˜λ¦¬ 기술둜 λŒ€ν™” λ§λ­‰μΉ˜ 뢄석을 톡해 μ œμ•ˆν•œ κΈ°μˆ λ°©λ²•μ˜ μ μ ˆμ„± 및 μœ μš©μ„±μ— λŒ€ν•œ 평가λ₯Ό ν•˜κ³  κ·Έ κ²°κ³Όλ₯Ό 보인닀.

Personalized Background Music Recommendation System for User Generated Contents using Collective Intelligence

Doojin Park and Jong C. Park
the Korean Society for Emotion & Sensibility, KIST, May, 2006.
Show abstract
졜근 μ‹Έμ΄μ›”λ“œμ™€ 같은 λΈ”λ‘œκ·Έ μ„œλΉ„μŠ€λ“€μ—μ„œ λ§Žμ€ μ‚¬μš©μžλ“€μ€ μžμ‹ μ˜ 글을 κ²Œμ‹œν•˜λ©΄μ„œ 이에 λ§žλŠ” 배경음 악을 ν•¨κ»˜ 올리고 μžˆλ‹€. μ΄λ•Œ, μ‚¬μš©μžκ°€ μ’‹μ•„ν•˜λŠ” μŒμ•…μ΄λ‚˜ μ‚¬μš©μžκ°€ νŒλ‹¨ν•˜κΈ°μ— κΈ€μ˜ λΆ„μœ„κΈ°μ— λ§žλŠ” 음 악을 μ„ μ •ν•΄μ„œ 올리게 λ˜λ‚˜ μ μ ˆν•œ μŒμ•…μ„ μ„ μ •ν•˜κΈ°λŠ” 쉽지 μ•Šλ‹€. ν•œνŽΈ κΈ°μ‘΄ μŒμ•…μΆ”μ²œ μ‹œμŠ€ν…œμ—μ„œλŠ” 특 μ • μŒμ•…μ— λŒ€ν•΄ μ „λ¬Έκ°€κ°€ μŒμ•…μ΄λ‘ μ— 따라 λΆ„μ„ν•˜μ—¬ κΈ°μž…ν•œ 감성정보λ₯Ό μ΄μš©ν•˜κ±°λ‚˜ μŒμ•…μ˜ νŒŒν˜•μ„ 뢄석 ν•΄μ„œ 얻은 감성정보λ₯Ό μ΄μš©ν•˜λ‚˜ μŒμ•…μ˜ νŠΉμ„±μƒ μŒμ•…μ—μ„œ λŠλΌλŠ” 감성듀은 개인적인 μ„±ν–₯에 따라 λ‹€λ₯΄λ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” μ‚¬μš©μžκ°€ λΈ”λ‘œκ·Έμ— μ˜¬λ¦¬λŠ” 글을 μžμ—°μ–Έμ–΄μ²˜λ¦¬ 기술둜 λΆ„μ„ν•˜μ—¬ 글이 λ‹΄κ³  μžˆλŠ” 감성정보 λ₯Ό ν¬ν•¨ν•œ 상황정보λ₯Ό μΆ”μΆœν•˜κ³ , 이런 정보에 ν•΄λ‹Ήν•˜λŠ” λ°°κ²½μŒμ•…μ„ μ‚¬μš©μž 정보λ₯Ό κ°μ•ˆν•˜μ—¬ μžλ™μœΌλ‘œ μΆ”μ²œν•΄μ£ΌλŠ” μ‹œμŠ€ν…œμ„ μ œμ•ˆν•œλ‹€.

Text Mining and Management in Biomedicine

Jong C. Park, Gary Geunbae Lee, and Limsoon Wong
Guest Editors' Introduction to the Special Issue, ACM Transactions on Asian Language Information Processing (TALIP), March, 2006.

BioContrasts: Extracting and Exploiting Protein-Protein Contrastive Relations from Biomedical Literature

Jung-jae Kim, Zhuo Zhang, Jong C. Park, and See-Kiong Ng
Bioinformatics, Vol. 22, No. 5, pp. 597-605, March, 2006.
Show abstract
Motivation: Contrasts are useful conceptual vehicles for learning processes and exploratory research of the unknown. For example, contrastive information between proteins can reveal what similarities, divergences and relations there are of the two proteins, leading to invaluable insights for better understanding about the proteins. Such contrastive information are found to be reported in the biomedical literature. However, there have been no reported attempts in current biomedical text mining work that systematically extract and present such useful contrastive information from the literature for exploitation.

Results: Our BioContrasts system extracts protein–protein contrastive information from MEDLINE abstracts and presents the information to biologists in a web-application for exploitation. Contrastive information are identified in the text abstracts with contrastive negation patterns such as β€˜A but not B’. A total of 799 169 pairs of contrastive expressions were successfully extracted from 2.5 million MEDLINE abstracts. Using grounding of contrastive protein names to Swiss-Prot entries, we were able to produce 41 471 pieces of contrasts between Swiss-Prot protein entries. These contrastive pieces of information are then presented via a user-friendly interactive web portal that can be exploited for applications such as the refinement of biological pathways.

Availability: BioContrasts can be accessed at http://biocontrasts.i2r.a-star.edu.sg. It is also mirrored at http://biocontrasts.biopathway.org

Supplementary information: Supplementary materials are available at Bioinformatics online.

Contact:skng@i2r.a-star.edu.sg; park@cs.kaist.ac.kr

Biomedical Text Mining for Knowledge Discovery and Automatic Ontology Extension

Jong C. Park
Invited presentation, Workshop on Text Mining, Ontology, and NLP in Biomedical Fields, Manchester, England, March 20-21, 2006.

u-SPACE: Ubiquitous Smart Parenting and Customized Education

Hye-Jin Min, Doojin Park, Eunyoung Chang, Ho-Joon Lee, and Jong C. Park
HCI Conference Korea, Phoenix Park, February, 2006.
Show abstract
λΆ€λͺ¨μ˜ μ‚¬νšŒ ν™œλ™ μ‹œκ°„μ΄ λŠ˜μ–΄λ‚¨μ— 따라 아이듀이 혼자 μ§‘μ—μ„œ λ³΄λ‚΄λŠ” μ‹œκ°„λ„ λŠ˜μ–΄λ‚˜κ³  μžˆλ‹€. λ”°λΌμ„œ μ•„μ΄λ“€μ˜ μžλ¦½μ‹¬μ„ 크게 μ œν•œν•˜μ§€ μ•ŠμœΌλ©΄μ„œ λ…ΈμΆœλ˜κΈ° μ‰¬μš΄ μ‹€λ‚΄ μœ„ν—˜μœΌλ‘œλΆ€ν„° 아이듀을 λ³΄ν˜Έν•˜κ³  μ•„μ΄μ˜ 심리, 감정적 μƒνƒœμ— 따라 μ μ ˆν•œ 지도λ₯Ό ν•΄μ£ΌλŠ” 도움이 ν•„μš”ν•˜λ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” RFID κΈ°μˆ μ„ 기반으둜 아이듀을 물리적 μœ„ν—˜μœΌλ‘œλΆ€ν„° λ³΄ν˜Έν•˜κ³  μžμ—°μ–Έμ–΄μ²˜λ¦¬ κΈ°μˆ μ„ μ΄μš©ν•˜μ—¬ μ•„μ΄μ˜ 심리, 감정 μƒνƒœμ— λ”°λ₯Έ μŒμ•…κ³Ό μ• λ‹ˆλ©”μ΄μ…˜μ˜ λ©€ν‹°λ―Έλ””μ–΄ μ½˜ν…μΈ λ₯Ό μ œκ³΅ν•œλ‹€. λ˜ν•œ 지속적인 관심이 ν•„μš”ν•œ 일정 관리, 일상 μƒν™œμ—μ„œ 도움을 μ£ΌλŠ” μ „μžμ œν’ˆ μ‚¬μš©λ²• μ•ˆλ‚΄ λ“±μ˜ 정보λ₯Ό μ œκ³΅ν•˜μ—¬ 아이 슀슀둜 μžμ‹ μ˜ 일을 ν•  수 μžˆλ„λ‘ 도움을 μ€€λ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” κ°€μƒμ˜ 가정을 λ””μžμΈν•˜μ—¬ μ‹€ν˜„ κ°€λŠ₯ν•œ μ‹œλ‚˜λ¦¬μ˜€λ₯Ό μ€‘μ‹¬μœΌλ‘œ 이와 같은 μ„œλΉ„μŠ€λ₯Ό μ‹œλ¬λ ˆμ΄μ…˜ν•œ κ²°κ³Όλ₯Ό 보인닀.

Customized Speech Synthesis for Children with Characteristic Behavioral Patterns

Ho-Joon Lee and Jong C. Park
HCI Conference Korea, Phoenix Park, February, 2006.
Show abstract
μŒμ„±μ„ ν†΅ν•œ μ‚¬μš©μž κ°„μ˜ 정보 κ΅ν™˜ 방법은 좔가적인 ν›ˆλ ¨ κ³Όμ •μ΄λ‚˜ μž₯λΉ„κ°€ ν•„μš”ν•˜μ§€ μ•Šκ³  곡간 μ œμ•½μ΄ 거의 μ—†κΈ° λ•Œλ¬Έμ— λ…Έμ•½μž λ“± μ‚¬μš©μžμ˜ μ—°λ ΉλŒ€μ— 관계없이 μ‚¬μš©λ  수 μžˆλ‹€. λ˜ν•œ μŒμ„± μ •λ³΄λŠ” μ‹œκ°μ΄λ‚˜ 촉각 λ“± λ‹€λ₯Έ 정보 수 λ‹¨κ³Όμ˜ μƒν˜Έ μž‘μš©μœΌλ‘œ μƒμŠΉ 효과λ₯Ό μœ λ°œν•  수 있기 λ•Œλ¬Έμ— μ‚¬λžŒκ³Ό 기계 사이 의 μΈν„°νŽ˜μ΄μŠ€λ‘œ ν™œμš©λ  경우 정보 전달λ ₯을 λ†’μ΄λ©΄μ„œ μ‚¬μš©μž μΉœν™”μ μΈ μ„œλΉ„ 슀λ₯Ό μ œκ³΅ν•  수 μžˆλ‹€. κ·ΈλŸ¬λ‚˜ λ™μΌν•œ μƒν™©μ—μ„œ λ™μΌν•œ μœ ν˜•μ˜ μŒμ„± 정보가 μ‚¬μš©μžμ—κ²Œ μ§€μ†μ μœΌλ‘œ 제곡될 경우 ν‘œν˜„μƒμ˜ λ‹¨μ‘°λ‘œμ›€μœΌλ‘œ 인해 정보 전달 λ ₯이 급감할 수 μžˆλŠ” λ¬Έμ œμ λ„ μ§€λ‹ˆκ³  μžˆλ‹€. λ”°λΌμ„œ μŒμ„±μ„ ν†΅ν•œ 정보 전달 의 경우 동일 상황이라 ν•˜λ”λΌλ„ μ‚¬μš©μžμ˜ 행동 νŒ¨ν„΄, 심리 μƒνƒœ, μ£Όλ³€ ν™˜κ²½ 등에 따라 μ°¨λ³„ν™”λœ λ¬Έμž₯ ꡬ쑰 및 μ–΄νœ˜μ˜ μ„ νƒμœΌλ‘œ κΈ΄μž₯감을 μœ μ§€μ‹œμΌœ 쀄 수 μžˆμ–΄μ•Ό ν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 5 μ„Έ μ „ν›„μ˜ 어린이λ₯Ό λŒ€μƒμœΌλ‘œ κ·Έλ“€μ˜ 행동 패 ν„΄ 뢄석에 κΈ°λ°˜ν•˜μ—¬ κ°œλ³„ν™”λœ μŒμ„± ν•©μ„± κ²°κ³Όλ₯Ό μ œκ³΅ν•˜λŠ” μ‹œμŠ€ν…œμ„ μ œμ•ˆν•œ λ‹€. 이λ₯Ό μœ„ν•΄ μœ μΉ˜μ›μ΄λΌλŠ” 물리적 κ³΅κ°„μ—μ„œ μ–΄λ¦°μ΄λ“€μ˜ 주된 행동 νŒ¨ν„΄μ„ λΆ„μ„ν•˜κ³ , ν˜„μ§ μœ μΉ˜μ› ꡐ사λ₯Ό λŒ€μƒμœΌλ‘œ λ™μΌν•œ 정보λ₯Ό μ „λ‹¬ν•˜λŠ” 쑰건을 톡 ν•˜μ—¬ μ–΄λ¦°μ΄μ˜ 행동 νŒ¨ν„΄κ³Ό μœ„μΉ˜ 정보, μ—°λ Ή 및 성격에 λ”°λ₯Έ λ°œν™” λ¬Έμž₯의 λ¬Έ μž₯ ꡬ쑰와 μ–΄νœ˜μ  νŠΉμ„±μ„ νŒŒμ•…ν•œλ‹€. μ΅œμ’…μ μœΌλ‘œ, κ°œλ³„ν™”λœ μŒμ„± ν•©μ„± κ²°κ³Όλ₯Ό μœ„ν•΄ μœ μΉ˜μ› 곡간을 μ‹œλ¬λ ˆμ΄μ…˜ ν•˜κ³  RFID λ₯Ό μ΄μš©ν•˜μ—¬ μ–΄λ¦°μ΄μ˜ 행동 νŒ¨ν„΄ 및 μœ„μΉ˜ 정보λ₯Ό νŒŒμ•…ν•œλ‹€. 그리고 각 상황에 따라 λΆ„μ„λœ λ°œν™”λ¬Έμ˜ λ¬Έμž₯ ꡬ 쑰와 μ–΄νœ˜ νŠΉμ„±μ„ λ°˜μ˜ν•˜μ—¬ μŒμ„±μœΌλ‘œ 합성될 λ¬Έμž₯의 λ¬Έμž₯ ꡬ쑰 및 μ–΄νœ˜λ₯Ό 재 κ΅¬μ„±ν•˜μ—¬ μ‚¬μš©μž κ°œλ³„ν™”λœ μŒμ„± ν•©μ„± κ²°κ³Όλ₯Ό μƒμ„±ν•œλ‹€. μ΄λŸ¬ν•œ κ²°κ³Όλ₯Ό 톡해 μ–΄λ¦°μ΄μ˜ 행동 νŒ¨ν„΄μ΄ λ°œν™”λ¬Έμ˜ λ¬Έμž₯ ꡬ쑰 및 μ–΄νœ˜μ— λ―ΈμΉ˜λŠ” 영ν–₯에 λŒ€ν•΄μ„œ μ‚΄νŽ΄λ³΄κ³  μž¬κ΅¬μ„±λœ κ²°κ³Ό λ°œν™”λ¬Έμ„ ν‰κ°€ν•œλ‹€.

Visualization for Digesting a High Volume of the Biomedical Literature

Changsu Lee, Jinah Park, and Jong C. Park
Bioinformatics and Biosystems, Vol. 1, No. 1, pp. 51-60, Feb. 2006.
Show abstract
The paradigm in biology is currently changing from that of conducting hypothesis-driven individual experiments to that of utilizing the results of a massive data analysis with appropriate computational tools. We present LayMap, an implemented visualization system that helps the user to deal with a high volume of the biomedical literature such as MEDLINE, through the layered maps that are constructed on the results of an information extraction system. LayMap also utilizes filtering and granularity for an enhanced view of the results. Since a biomedical information extraction system gives rise to a focused and effective way of slicing up the data space, the combined use of LayMap with such an information extraction system can help the user to navigate the data space in a speedy and guided manner. As a case study, we have applied the system to datasets of journal abstracts on ’MAPK pathway’ and ’bufalin’ from MEDLINE. With the proposed visualization, we have successfully rediscovered pathway maps of a reasonable quality for ERK, p38 and JNK. Furthermore, with respect to bufalin, we were able to identify the potentially interesting relation between the Chinese medicine Chan su and apoptosis with a high level of detail.

Effective text visualization for biomedical information

Tak-eun Kim and Jong C. Park
HCI Conference Korea, Phoenix Park, February, 2007.
Show abstract
생물 의료 λΆ„μ•Όμ—μ„œ μ •λ³΄μ˜ 양이 μ•„μ£Ό λΉ λ₯΄κ²Œ μ¦κ°€ν•˜κ³  μžˆλ‹€. μ΄λŸ¬ν•œ λ°©λŒ€ν•œ μ–‘μ˜ μ •λ³΄μ—μ„œ μœ μš©ν•œ 정보λ₯Ό μΆ”μΆœν•˜κΈ° μœ„ν•΄ ν…μŠ€νŠΈ λ§ˆμ΄λ‹ 기법을 μ΄μš©ν•œ 연ꡬ듀이 많이 μ§„ν–‰λ˜μ–΄ μ™”λ‹€. κ·Έλ ‡μ§€λ§Œ μ΄λ ‡κ²Œ 뽑아진 정보쑰차 κ·Έ 양이 λ°©λŒ€ν•˜κ³ , λ˜ν•œ ν…μŠ€νŠΈλ‘œ λ˜μ–΄ 있기 λ•Œλ¬Έμ— μ§κ΄€μ μœΌλ‘œ μ΄ν•΄ν•˜κΈ°κ°€ μ–΄λ ΅λ‹€. λ”°λΌμ„œ μ΄λŸ¬ν•œ 정보듀을 μ’€ 더 μ§κ΄€μ μœΌλ‘œ μ΄ν•΄ν•˜κΈ° μœ„ν•΄μ„œλŠ” 정보 μ‹œκ°ν™” μ‹œμŠ€ν…œμ΄ ν•„μˆ˜μ μ΄λ‹€. 졜근 λ“€μ–΄ μ΄λŸ¬ν•œ 정보 μ‹œκ°ν™”μ— λŒ€ν•œ 연ꡬ가 많이 μ§„ν–‰λ˜μ—ˆμœΌλ‚˜ μ΄λŸ¬ν•œ μ‹œκ°ν™” 정보쑰차 λ„ˆλ¬΄λ‚˜ λ°©λŒ€ν•˜κΈ° λ•Œλ¬Έμ— μ‚¬μš©μžκ°€ ν•„μš”λ‘œ ν•˜λŠ” 정보λ₯Ό μ—¬κ³Όν•΄ μ£ΌλŠ” 방법이 ν•„μš”ν•˜λ‹€. 그리고 μ‹œκ°ν™” μ‹œμŠ€ν…œμ—μ„œμ˜ 지식 λ°œκ²¬μ„ μœ„ν•œ 방법을 μ œκ³΅ν•˜μ—¬μ•Ό ν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 생물 의료 μ •λ³΄μ˜ ν…μŠ€νŠΈ μ‹œκ°ν™”μ— μ΄ˆμ μ„ λ§žμΆ”μ–΄ 생물 의료 μ •λ³΄μ˜ 효과적인 ν‘œν˜„ 방법과 지식 λ°œκ²¬μ„ μœ„ν•œ 직관적인 μΈν„°νŽ˜μ΄μŠ€λ₯Ό μ œμ•ˆν•˜κ³ μž ν•œλ‹€.

Automatic Extension of Gene Ontology with Induced Prediction and Flexible Validation of Candidate Terms

Jin-Bok Lee and Jong C. Park
5th Singapore-Korea Workshop on Bioinformatics and Natural Language Processing, National University of Singapore, Singapore, February 22, 2006.

Augmenting Visualization with Audioization for Enhanced Knowledge Discovery

Ho-Joon Lee and Jong C. Park
5th Singapore-Korea Workshop on Bioinformatics and Natural Language Processing, National University of Singapore, Singapore, February 22, 2006.

Explorative search with relational description of biological entities into multiple heterogeneous databases

Hodong Lee and Jong C. Park
5th Singapore-Korea Workshop on Bioinformatics and Natural Language Processing, National University of Singapore, Singapore, February 22, 2006.

Diagnoses of Emotional Disorders for Amygdala-related Pathway with BioIE

Hye-Jin Min and Jong C. Park
5th Singapore-Korea Workshop on Bioinformatics and Natural Language Processing, National University of Singapore, Singapore, February 22, 2006.

Towards an efficient CCG parser for RNA secondary structure prediction

Hee-Jin Lee and Jong C. Park
5th Singapore-Korea Workshop on Bioinformatics and Natural Language Processing, National University of Singapore, Singapore, February 22, 2006.

Term Characterization for Information Extraction with Syntactic Pattern Matching

Jung-jae Kim and Jong C. Park
5th Singapore-Korea Workshop on Bioinformatics and Natural Language Processing, National University of Singapore, Singapore, February 22, 2006.

Named Entity Recognition

Jong C. Park and Jung-jae Kim
Chapter six of the book 'Text Mining for Biology', editors: Ben Stapley and Sophia Ananiadou, Artech House Books, January, 2006.

Semantic Representation for Temporal Adverbs and Temporal Morphemes

Eunyoung Chang and Jong C. Park
Proceedings of Annual Conference of the KSLI (Korea Society for Language and Information), pp. 193-207, Kangwon, Korea, 2006.
Show abstract
상황은 λ¬Έμž₯μ—μ„œ 주둜 μš©μ–ΈμœΌλ‘œ 기술되며, μƒν™©μ˜ μ‹œκ°„μ  μ˜λ―ΈλŠ” μ‹œκ°„μ–΄μ— μ˜ν•΄ λ”°λ‘œ ν‘œν˜„λœλ‹€. 이 μ€‘μ—μ„œλ„ μ‹œκ°„ 뢀사와 μ‹œμƒ ν˜•νƒœμ†Œ(선어말 μ–΄λ―Έ)κ°€ μ‹œμ œμ™€ 상에 κ²°μ •μ μœΌλ‘œ κΈ°μ—¬ν•œλ‹€κ³  μ•Œλ €μ Έ μžˆμœΌλ‚˜, μ—¬λŸ¬ 성뢄이 λ¬Έμž₯ λ‚΄μ—μ„œ λ³΅ν•©μ μœΌλ‘œ λ‚˜νƒ€λ‚˜κΈ° λ•Œλ¬Έμ— 각 μ„±λΆ„μ˜ μ˜λ―Έμ™€ κΈ°λŠ₯에 λŒ€ν•΄μ„œλŠ” 아직 의견이 μ •λ¦¬λ˜μ§€ μ•Šμ€ 상황이닀. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μƒν™©μ˜ μ‹œκ°„μ  속성을 λΆ„λ₯˜ν•˜κ³ , μ‹œκ°„ 뢀사와 μ‹œμƒ ν˜•νƒœμ†Œκ°€ 각 속성에 λΌμΉ˜λŠ” 영ν–₯을 λΆ„μ„ν•˜μ—¬ μ–΄νœ˜ λ‹¨μœ„μ˜ 의미 ν‘œν˜„ 방식을 μ œμ•ˆν•œλ‹€. μ‹œκ°„ λΆ€μ‚¬λŠ” μƒν™©μ‹œμ˜ μœ„μΉ˜λ‚˜ μƒν™©μ˜ μ‹œκ°„μ  속성을 μˆ˜μ‹ν•˜κ³ , μ‹œμƒ ν˜•νƒœμ†ŒλŠ” λ°œν™”μ‹œμ™€ μƒν™©μ‹œμ˜ 관계 λ˜λŠ” ν™”μžμ˜ 상황에 λŒ€ν•œ νƒœλ„λ₯Ό λ‚˜νƒ€λ‚Έλ‹€. 이λ₯Ό λ°”νƒ•μœΌλ‘œ μ μ ˆν•œ μ–΄νœ˜ λ²”μ£Όλ₯Ό μ œμ‹œν•˜κ³ , μ΄λ“€μ˜ 결합에 μ˜ν•˜μ—¬ μ΅œμ’… μ˜λ―Έκ°€ λ„μΆœλ˜λŠ” 과정을 결합범주문법을 ν†΅ν•œ 처리 κ³Όμ •μœΌλ‘œ 보인닀.

Linguistic Characterization of Sign Language Expressions for an Automatic Mapping from Natural Language Sentences

Jiwon Choi, Eunyoung Chang, Hee-Jin Lee, and Jong C. Park
Language and Information, Vol. 10, No. 1, pp. 71-91, 2006.

Generation of Coherent Gene Summary

Chan-Goo Kang
MS thesis, KAIST, 2006.

Automatic extension of Gene Ontology with flexible identification of candidate terms

Jin-Bok Lee, Jung-jae Kim, and Jong C. Park
Bioinformatics, Vol. 22 No. 6, pp. 665-670, 2006.
Show abstract
Motivation: Gene Ontology (GO) has been manually developed to provide a controlled vocabulary for gene product attributes. It continues to evolve with new concepts that are compiled mostly from existing concepts in a compositional way. If we consider the relatively slow growth rate of GO in the face of the fast accumulation of the biological data, it is much desirable to provide an automatic means for predicting new concepts from the existing ones.

Results: We present a novel method that predicts more detailed concepts by utilizing syntactic relations among the existing concepts. We propose a validation measure for the automatically predicted concepts by matching the concepts to biomedical articles. We also suggest how to find a suitable direction for the extension of a constantly growing ontology such as GO.

Availability:http://autogo.biopathway.org

Contact:park@nlp.kaist.ac.kr

Supplementary information: Supplementary materials are available at Bioinformatics online.

CCG-based RNA Secondary Structure Prediction for Structural Homology Analysis

Hee-Jin Lee and Jong C. Park
6th International Conference on Genome Informatics (GIW), Yokohama, Japan, December, 2005.
Show abstract
Various systems have been proposed to predict secondary structures of RNAs using their sequence information. Among them, Uemura et al. [2] described a system that recognizes some typical RNA secondary structures such as hairpin loops and pseudoknots with Tree Adjoining Grammar. However, their work captures only known sub-structures, and not those unknown sub-structures that might also exist. Ternary pseudoknot, composed of three pairs of cross-serially arranged reverse-complementary sequences, may be one such example. Figure 1 illustrates an example ternary pseudoknot. We describe a version of Combinatory Categorial Grammars (CCGs) for an RNA secondary prediction system to discover such unknown sub-structures. The parser for the proposed CCG takes an RNA sequence and produces the semantics string that contains structural information of the sequence.

From Text to Sign Language: Exploiting the Spatial and Motioning Dimension

Jiwon Choi, Hee-Jin Lee, and Jong C. Park
Proceedings of the 19th Pacific Asia Conference on Language, Information and Computation (PACLIC 19), pp. 61-69, Taipei, Taiwan, December, 2005.
Show abstract
In this paper, we address the problem of automatically converting information in the Korean language to one in a sign language as used in Korea. First, we discuss the differences between sign language and natural language, and in particular between the sign language in Korea and the Korean language. Then, we focus on issues that are relevant to the process of converting expressions in Korean into their counterparts in the sign language, including: 1) making explicit elided subjects of expressions in Korean, 2) omitting some expressions in Korean, and 3) reordering some expressions. We argue that it is important to utilize the spatial and motioning dimensionality of a sign language in order to minimize information loss and distortion. We also argue that the right decision to omit, or to merge some expressions in Korean plays a key role in exploiting this dimensionality. Finally, we present a system that converts sentences in Korean into corresponding animations in the sign language as proof of evidence for our claim.

Dynamic Informative Link Annotation for Biological Text over Heterogeneous Databases

Hodong Lee and Jong C. Park
16th International Conference on Genome Informatics (GIW), Yokohama, Japan, December, 2005.
Show abstract
Linking from a textual object to the biological databases is actively performed for an efficient data access and information enrichment [2]. This task targets at a link for particular types of term, such as names, keywords and symbols, that correspond to each data entry. However, such one-to-one matching links are still insufficient to make a full use of biological data in numerous databases. The previous researches have reported further problems: (1) The conceptual term referring to multiple data objects cannot be represented as a one-to-one link [1]; (2) the complex term often corresponds to the data objects from multiple databases [6]; (3) the link must be consistent with the data objects that can be changed or removed from a database [4]; and (4) the term is ambiguous due to the semantic and syntactic heterogeneity, which requires not only the structural and operational pieces of database information but also the biological pieces of knowledge about the term semantics [4, 5]. We address all the problems above with a dynamic link annotation system that annotates links by formulating the database statement in a formal query language. We are currently developing the system for 13 molecular biology databases mediated by SRS and Entrez: GO, GOA, UniProt, InterPro, EMBL, and Enzyme in SRS; Gene, Protein, Nucleotide, PubMed, OMIM, HomoloGene, and Taxonomy in Entrez.

Vowel Sound Disambiguation for Intelligible Korean Speech Synthesis

Ho-Joon Lee and Jong C. Park
Proceedings of the 19th Pacific Asia Conference on Language, Information and Computation (PACLIC 19), pp. 131-142, Taipei, Taiwan, December, 2005.
Show abstract
For speech synthesis systems that transform text materials into voice data, correctness and naturalness are the crucial measures of performance, the latter gaining more emphasis recently. In order to make synthesized voices natural, we must take into account pronunciation-related linguistic phenomena such as homograph, among others. The syntax certainly provides an important clue to disambiguating such homographs, but the relatively free word order in the Korean language makes it hard to utilize such information. In this paper, we describe a computational generation of contextually appropriate vowel lengths for the words in Korean by utilizing a higher level of linguistic information in a Combinatory Categorial Grammar framework. We consider parts-of- speech information, the possibility of conjunction with a suffix, case information, unconjugated adjectives, numerals, numerical adjectives with related nouns, and the relationship between a noun and its predicate as syntactic and semantic clues for vowel sound disambiguation. The results are expressed in Speech Synthesis Markup Language (SSML) for a target system neutral application. The proposed system with correctly predicted vowel sound can be used not only as an educational tool, but also as a plug-in for enhancing the intelligibility of a general purpose Text-to-Speech (TTS) system.

Text Animation with Music

Doojin Park and Jong C. Park
Proceedings of the 32th Korea Information Science Society (KISS), Vol. 32, No. 2, pp. 526-528, Seoul, November, 2005.
Show abstract
μŒμ•…μ€ μŠ€ν† λ¦¬ν…”λ§μ—μ„œ μ΄μ•ΌκΈ°μ˜ λΆ„μœ„κΈ°μ™€ 흐름을 μ „λ‹¬ν•˜λŠ”λ° μ€‘μš”ν•œ 역할을 ν•œλ‹€. 졜근 컴퓨터 μ• λ‹ˆλ©”μ΄μ…˜μ— μžλ™μœΌλ‘œ μ•Œλ§žμ€ μŒμ•…μ„ μ‚½μž…ν•˜κΈ° μœ„ν•˜μ—¬ λ§Žμ€ 연ꡬ가 μ§„ν–‰λ˜κ³  μžˆμ§€λ§Œ 이야기가 μžˆλŠ” μ• λ‹ˆλ©”μ΄μ…˜λ³΄λ‹€λŠ” 주둜 μ˜μƒλ¬Όμ˜ 동기화λ₯Ό μœ„ν•œ 연ꡬ가 λŒ€λΆ€λΆ„μ΄μ—ˆλ‹€. ν…μŠ€νŠΈ μ• λ‹ˆλ©”μ΄μ…˜μ€ 동화λ₯Ό μžλ™μœΌλ‘œ λΆ„μ„ν•˜μ—¬ μ• λ‹ˆλ©”μ΄μ…˜μ„ λ§Œλ“€μ–΄μ£ΌλŠ” 연ꡬ이닀. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” λ™ν™”μ˜ 이야기 ꡬ쑰에 κ·Όκ±°ν•˜μ—¬ 각 μž₯면의 λΆ„μœ„κΈ°μ— λ§žλŠ” μŒμ•… μžμ§ˆμ„ μžλ™μœΌλ‘œ μΆ”μΆœν•˜λŠ” 과정을 보이고 이λ₯Ό μ΄μš©ν•˜μ—¬ ν…μŠ€νŠΈ μ• λ‹ˆλ©”μ΄μ…˜μ— μŒμ•…μ΄ μ‚½μž…λ  수 μžˆλŠ” 방법에 λŒ€ν•˜μ—¬ λ…Όμ˜ν•œλ‹€.

Prediction of RNA Secondary Structures in a Combinatory Categorial Grammar Framework

Hee-Jin Lee and Jong C. Park
Proceedings of the First International Symposium on Languages in Biology and Medicine (LBM), pp. 59-62, KAIST, Daejeon, Korea, November, 2005.
Show abstract
In this paper, we define a Combinatory Categorial Grammar (CCG) to model and predict RNA secondary structures. The proposed CCG can be used to capture various RNA secondary structures, including stem-loop and pseudoknot structures. We also argue that the CCG can be used to predict possibly unknown RNA secondary structures, for example an undiscovered structure 'ternary-pseudoknots'.

Automated Linking of Conceptual and Complex Terms into Data Objects in Biological Databases

Hodong Lee and Jong C. Park
Proceedings of the First International Symposium on Languages in Biology and Medicine (LBM), pp. 51-54, Creative Learning Building, KAIST, Daejeon, Korea, November, 2005.
Show abstract
The purpose of a textual link is to provide a one-to-one connection between a term and a related data object. However, this link is insufficient to deal with the conceptual and complex terms that are often used to refer to multiple data objects from heterogeneous databases. In this paper, we present a method that can dynamically create a link to a biological term by automatically constructing a database query for a search into the corresponding data object(s). This method can help the user to quickly build a hypothesis based on data drawn from text, as well as to understand the text by providing an access to relevant information for its biological terms.

Generation of Coherent Gene Summary with Concept-Linking Sentences

Chan-Goo Kang and Jong C. Park
Proceedings of the First International Symposium on Languages in Biology and Medicine (LBM), pp. 41-45, Creative Learning Building, KAIST, Daejeon, Korea, November, 2005.
Show abstract
Typical approaches to automatic summarization make efforts to generate a coherent document by arranging the order of sentences according to certain criteria such as the publication date of the text in which the expression appears. However, when describing a gene, there is no obvious order whatsoever among the facts to be presented. In this work, while generating a summary about a gene, we actually create the order from the unordered set of facts, by introducing new sentences that make associations among the main concepts of those facts.

CCG-based RNA Secondary Structure Prediction

Hee-Jin Lee and Jong C. Park
The First International Symposium on Languages in Biology and Medicine (LBM), Daejeon, Korea, November, 2005.
Show abstract
In this paper, we define a Combinatory Categorial Grammar (CCG) to model and predict RNA secondary structures. The proposed CCG can be used to capture various RNA secondary structures, including stem-loop and psudoknot structures. We also argue that the CCG can be used to predict possibly unknown RNA secondary structures, for example an undiscovered structure 'ternary-pseudoknots'.

Dynamic and Informative Linking from Biological Text into Heterogeneous Databases

Hodong Lee and Jong C. Park
The First International Symposium on Languages in Biology and Medicine (LBM), Daejeon, Korea, November, 2005.
Show abstract
Linking from a textual object to the biological databases is actively performed for an efficient data access and information enrichment [2]. This task targets at a link for particular types of term, such as names, keywords and symbols, that correspond to each data entry. However, such one-to-one matching links are still insufficient to make a full use of biological data in numerous databases. The previous researches have reported further problems: (1) The conceptual term referring to multiple data objects cannot be represented as a one-to-one link [1]; (2) the complex term often corresponds to the data objects from multiple databases [6]; (3) the link must be consistent with the data objects that can be changed or removed from a database [4]; and (4) the term is ambiguous due to the semantic and syntactic heterogeneity, which requires not only the structural and operational pieces of database information but also the biological pieces of knowledge about the term semantics [4, 5]. We address all the problems above with a dynamic link annotation system that annotates links by formulating the database statement in a formal query language. We are currently developing the system for 13 molecular biology databases mediated by SRS and Entrez: GO, GOA, UniProt, InterPro, EMBL, and Enzyme in SRS; Gene, Protein, Nucleotide, PubMed, OMIM, HomoloGene, and Taxonomy in Entrez.

Intonation Synthesis using Emotional Information from Spoken Fairy Tale

Ho-Joon Lee and Jong C. Park
Proceedings of the 17th Korean Association of Speech Science (KASS), pp. 88-97, Seoul, November 26, 2005.
Show abstract
정보 기술의 λ°œλ‹¬λ‘œ μ‚¬μš©μž μ€‘μ‹¬μ˜ μΈν„°νŽ˜μ΄μŠ€κ°€ λΆ€κ°λ˜λ©΄μ„œ μŒμ„± ν•©μ„± 기술의 ν™œμš©μ΄ 점점 λŠ˜μ–΄λ‚˜κ³  μžˆλŠ” 좔세이닀. μžμ—°μŠ€λŸ¬μš΄ μŒμ„± 합성을 μœ„ν•΄μ„œλŠ” λ°œν™” 상황에 μ ν•©ν•œ μ–΅μ–‘ 정보λ₯Ό μƒμ„±ν•˜λŠ” 것이 μ€‘μš”ν•˜κ³ , 특히 κ°μ •μ˜ 변화에 λ”°λ₯Έ μžμ—°μŠ€λŸ¬μš΄ μŒμ„± 합성을 μœ„ν•΄μ„œλŠ” μ–΅μ–‘ 정보 μ€‘μ—μ„œλ„ 음의 λ†’λ‚이λ₯Ό μ μ ˆν•˜κ²Œ μ‘°μ ˆν•˜λŠ” 것이 ν•„μš”ν•˜λ‹€. 감정 정보λ₯Ό μŒμ„± ν•©μ„± κΈ°μˆ μ— μ μš©ν•˜κΈ° μœ„ν•΄μ„œλŠ” 감정 정보가 잘 ν‘œν˜„λ˜μ–΄ μžˆλŠ” μŒμ„± λ°μ΄ν„°μ˜ 뢄석이 μ„ ν–‰ λ˜μ–΄μ•Ό ν•˜κ³ , 이와 κ΄€λ ¨ν•œ μžλ£Œλ‘œμ„œ 동화 ꡬ연 μŒμ„± λ°μ΄ν„°λŠ” μ•„μ΄λ“€μ—κ²Œ 보닀 사싀감 μžˆλŠ” λ‚΄μš© 전달을 μœ„ν•΄ 감정 정보가 ν’λΆ€ν•˜κ²Œ ν‘œν˜„λ˜μ–΄μžˆλŠ” νŠΉμ§•μ΄ μžˆλ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” 동화 ꡬ μ—° 전문가에 μ˜ν•΄ λ…ΉμŒλœ μ „λž˜ μΈν˜•κ·Ήμ„ λΆ„μ„ν•˜μ—¬ 감정 μƒνƒœμ— λ”°λ₯Έ λ°œν™”λ¬Έμ˜ 음운 νŠΉμ„±μ„ μ‚΄νŽ΄λ³΄κ³ , μ΄λŸ¬ν•œ 감정 정보와 λ¬Έμž₯의 톡사, 의미 ꡬ쑰 λ“± 언어학적인 μ •λ³΄μ™€μ˜ 관계λ₯Ό λ°” νƒ•μœΌλ‘œ 감정 정보λ₯Ό μŒμ„± ν•©μ„± μ‹œμŠ€ν…œμ— μ œκ³΅ν•˜μ—¬ 적절히 κ΅¬μ‚¬ν•˜λŠ” 방법에 λŒ€ν•΄μ„œ λ…Όμ˜ν•œλ‹€.

Modeling Causality in Biological Pathways for Logical Identification of Drug Targets

Il Park and Jong C. Park
Proceedings of the 2005 International Joint Conference of InCoB, AASBi and KSBI (Bioinfo 2005), pp. 373-378, Busan, Korea, September, 2005.

Lexical Disambiguation for Intonation Synthesis: A CCG Approach

Ho-Joon Lee and Jong C. Park
Korean Society for Language and Information (KSLI), June 17-18, 2005.
Show abstract
IT의 κΈ‰κ²©ν•œ λ°œμ „κ³Ό ν•¨κ»˜ μƒˆλ‘œμš΄ ν˜•νƒœμ˜ 정보 전달 방법이 μ§€μ†μ μœΌλ‘œ λ‚˜νƒ€λ‚˜λ©΄μ„œ 우리말의 μ •ν™•ν•œ λ°œμŒμ— λŒ€ν•œ 인식이 점점 μ•½ν™”λ˜κ³  μžˆλŠ” 좔세이닀. 특히 μž₯λ‹¨μŒμ˜ λ°œμŒμ€ λ°œν™”μ— λŒ€ν•œ 전문인듀도 μ •ν™•ν•˜κ²Œ κ΅¬λΆ„ν•˜μ§€ λͺ»ν•˜κ³  μžˆλŠ” μ‹¬κ°ν•œ 상황이닀. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν•œκ΅­μ–΄ λͺ…μ‚¬μ—μ„œ λ‚˜νƒ€λ‚˜λŠ” μž₯λ‹¨μŒ ν™” ν˜„μƒμ„ μ£Όλ³€ μ–΄νœ˜μ™€μ˜ 관계λ₯Ό λ°”νƒ•μœΌλ‘œ μ‚΄νŽ΄λ³΄κ³  λ™μŒμ΄μ˜μ–΄ 쀑 λ‹€λ₯΄κ²Œ λ°œμŒλ˜λŠ” λͺ…μ‚¬μ˜ μž₯λ‹¨μŒ ꡬ뢄을 λͺ…사와 λͺ…μ‚¬μ˜ μˆ˜μ‹μ–΄, λͺ…μ‚¬μ˜ μ„œμˆ μ–΄μ™€μ˜ 관계λ₯Ό μ€‘μ‹¬μœΌλ‘œ λ…Όμ˜ν•œλ‹€. λΆ„μ„λœ κ²°κ³ΌλŠ” κ²°ν•©λ²” 주문법을 μ΄μš©ν•˜μ—¬ ν‘œν˜„ν•˜κ³  μ–΄νœ˜μ  μ€‘μ˜μ„±μ΄ ν•΄μ†Œλœ μŒμ„± ν•©μ„± 과정을 ν‘œμ€€ν™”λœ SSML (Speech Synthesis Markup Language)으둜 κΈ°μˆ ν•œλ‹€.

Induced Extension of Gene Ontology from Biomedical Resources with Flexible Identification of Candidate Terms

Jin-Bok Lee, Jung-jae Kim, and Jong C. Park
The First International Symposium on Semantic Mining in Biomedicine (SMBM), page 13, Cambridge, UK, April, 2005.
Show abstract
Motivation: We present a novel method to predict more detailed terms than those in the present Gene Ontology (GO). We apply this method to semantic tagging for natural language expressions that denote potential GO terms even when there is no direct mapping of such expressions into GO terms. The terms that are newly identified in this process can be used to extend GO by utilizing semantic relations such as hyponyms or synonyms. Finally, we suggest how to find a suitable direction for the possible extension of an ever-growing ontology such as GO.
Results: We provide an automatically extended GO, and tools for its manipulation and validation.
Availability: http://www.biopathway.org
Contact: park@nlp.kaist.ac.kr

Deciding When to Stop: Enhancing the Performance of Information Extraction with Deeper Linguistic Analysis

Jung-jae Kim and Jong C. Park
Proceedings of the 3rd Korea-Singapore Joint Workshop on Bioinformatics and Natural Language Processing, pp. 41-45, Muju Resort, Jeonbuk, South Korea, February, 2005.

Information Visualization with Text Data Mining for Knowledge Discovery Tools in Bioinformatics

Jinah Park, Changsu Lee, and Jong C. Park
Key Engineering Materials, Vols. 277-279, pp. 259-265, 2005. (SCI IF 0.224)
Show abstract
An abundant amount of information is produced in the digital domain, and an effective information extraction (IE) system is required to surf through this sea of information. In this paper, we show that an interactive visualization system works effectively to complement an IE system. In particular, three-dimensional (3D) visualization can turn a data-centric system into a user-centric one by facilitating the human visual system as a powerful pattern recognizer to become a part of the IE cycle. Because information as data is multidimensional in nature, 2D visualization has been the preferred mode. However, we argue that the extra dimension available for us in a 3D mode provides a valuable space where we can pack an orthogonal aspect of the available information. As for candidates of this orthogonal information, we have considered the following two aspects: 1) abstraction of the unstructured source data, and 2) the history line of the discovery process. We have applied our proposal to text data mining in bioinformatics. Through case studies of data mining for molecular interaction in the yeast and mitogen-activated protein kinase pathways, we demonstrate the possibility of interpreting the extracted results with a 3D visualization system.

A Graphic Tool for Curating Molecular Interaction Networks from the Literature

Changsu Lee, Jinah Park, and Jong C. Park
International Journal of Computers in Biology and Medicine, Vol. 35, pp. 555-564, 2005.
Show abstract
We propose a graphic tool for curating molecular interaction networks constructed from the literature by information extraction (IE). In order to turn preliminary results from IE into useful biomedical resources, we propose to use a controlled environment in which visualization and IE work synergistically. The usability of the proposed graphic tool is shown with respect to the identification of incorrectly extracted results that are due to the much troubling coordination phenomena in natural language texts. Through the experiment on molecular interactions in Saccaharomyces cerevisiae, we have seen a meaningful increase (from 91.5% to 97.5%) in the number of correctly extracted interaction information.

Automatic Generation of Multimedia Animation from Play Scripts

Doojin Park and Jong C. Park
HCI Conference Korea, 2005.
Show abstract
ν…μŠ€νŠΈ μ• λ‹ˆλ©”μ΄μ…˜μ€ μžμ—°μ–Έμ–΄λ¬Έμž₯μœΌλ‘œλΆ€ν„° μ• λ‹ˆλ©”μ΄μ…˜μ„ μžλ™μœΌλ‘œ μƒμ„±ν•˜ κΈ° μœ„ν•œ 연ꡬ이닀. ν…μŠ€νŠΈ μ• λ‹ˆλ©”μ΄μ…˜μ„ μž‘κ°€μ˜ μ˜λ„λŒ€λ‘œ μ‹€ν˜„ν•˜κΈ° μœ„ν•΄μ„œλŠ” μΊλ¦­ν„°μ˜ ν–‰λ™λΏλ§Œ μ•„λ‹ˆλΌ 뢀가적인 λ©€ν‹°λ―Έλ””μ–΄ νš¨κ³Όκ°€ ν•„μˆ˜μ μœΌλ‘œ μš”κ΅¬λœ λ‹€. μ΄λŸ¬ν•œ 효과λ₯Ό λ‚˜νƒ€λ‚΄λŠ” μ •λ³΄λŠ” 일반적인 ν…μŠ€νŠΈμ—μ„œ μΆ©λΆ„νžˆ μ œκ³΅λ˜μ§€ μ•Šμ§€λ§Œ μ—°κ·Ή 곡연을 μœ„ν•œ λŒ€λ³Έμ—λŠ” λ‹€μ–‘ν•œ λΆ€κ°€ 정보듀이 μ–΄λŠ 정도 μ •ν˜•μ  으둜 μ œμ‹œλœλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ—°κ·Ή λŒ€λ³Έμ˜ λŒ€μ‚¬, 지문, 해섀을 μžλ™μœΌλ‘œ λΆ„μ„ν•˜μ—¬ 캐릭터 의 행동과 음ν–₯이 ν†΅ν•©λœ λ©€ν‹°λ―Έλ””μ–΄ μ• λ‹ˆλ©”μ΄μ…˜μ„ μƒμ„±ν•˜λŠ” 과정을 보인닀. 음ν–₯은 극적 효과λ₯Ό μœ„ν•œ 기본적인 μž₯치둜, μΊλ¦­ν„°μ˜ 행동과 효과적으둜 톡 ν•©λ˜κΈ° μœ„ν•΄μ„œλŠ” μ—°κ·Ή λŒ€λ³Έμ—μ„œ ν‘œν˜„λœ 음ν–₯효과λ₯Ό 직접 μΆ”μΆœν•˜κ±°λ‚˜ 상식정 보λ₯Ό μ΄μš©ν•œ μΆ”λ‘ μœΌλ‘œ μ ν•©ν•œ 음ν–₯을 μž…μ²΄μ μ΄κ³  μ‹œκ°„μ˜ 흐름에 맞게 ν‘œν˜„ν•΄ μ£Όμ–΄μ•Ό ν•œλ‹€. μ΄λŸ¬ν•œ 과정을 μœ„ν•΄ μ—°κ·Ή λŒ€λ³Έμ˜ μžμ—°μ–Έμ–΄ ν‘œν˜„μ„ κ²°ν•©λ²”μ£Όλ¬Έ λ²•μœΌλ‘œ λΆ„μ„ν•˜μ—¬ μΊλ¦­ν„°μ˜ 행동과 음ν–₯νš¨κ³Όκ°„μ˜ μƒν˜Έμž‘μš©μ„ μΆ”μΆœν•˜κ³  이에 λ”°λ₯΄λŠ” μΊλ¦­ν„°μ˜ 행동과 음ν–₯효과λ₯Ό 3D λͺ¨λΈ λ°μ΄ν„°λ² μ΄μŠ€μ™€ 음ν–₯ 데이터베 이슀λ₯Ό ν™œμš©ν•˜μ—¬ λ©€ν‹°λ―Έλ””μ–΄ μ• λ‹ˆλ©”μ΄μ…˜μœΌλ‘œ μƒμ„±ν•œλ‹€.

Identification of Emotional Flow from Natural Language Documents

Hye-Jin Min
MS thesis, KAIST, 2005.

Extracting contrastive information from negation patterns in biomedical literature

Jung-jae Kim and Jong C. Park
ACM Transactions on Asian Language Information Processing (TALIP), Special Issue on Text Mining and Management in Biomedicine, 2006.
Show abstract
Expressions of negation in the biomedical literature often encode information of contrast as a means for explaining significant differences between the objects that are so contrasted. We show that such information gives additional insights into the nature of the structures and/or biological functions of these objects, leading to valuable knowledge for subcategorization of protein families by the properties that the involved proteins do not have in common. Based on the observation that the expressions of negation employ mostly predictable syntactic structures that can be characterized by subclausal coordination and by clause-level parallelism, we present a system that extracts such contrastive information by identifying those syntactic structures with natural language processing techniques and with additional linguistic resources for semantics. The implemented system shows the performance of 85.7% precision and 61.5% recall, including 7.7% partial recall, or an F score of 76.6. We apply the system to the biological interactions as extracted by our biomedical information extraction system in order to enrich proteome databases with contrastive information.

Introduction to the Thematic Session on Text Mining in Biomedicine

Sophia Ananiadou and Jong C. Park
Lecture Notes in Artificial Intelligence (LNAI), Vol. 3248 (revised selected papers from IJCNLP 2004), editors: K-Y Su, J. Tsujii, J.-H. Lee, O. Y. Kwong, p. 776, 2005.
Show abstract
This thematic session follows a series of workshops and conferences recently dedicated to bio text mining in Biology. This interest is due to the overwhelming amount of biomedical literature, Medline alone contains over 14M abstracts, and the urgent need to discover and organise knowledge extracted from texts. Text mining techniques such as information extraction, named entity recognition etc. have been successfully applied to biomedical texts with varying results. A variety of approaches such as machine learning, SVMs, shallow, deep linguistic analyses have been applied to biomedical texts to extract, manage and organize information. There are over 300 databases containing crucial information on biological data. One of the main challenges is the integration of such heterogeneous information from factual databases to texts. One of the major knowledge bottlenecks in biomedicine is terminology. In such a dynamic domain, new terms are constantly created. In addition there is not always a mapping among terms found in databases, controlled vocabularies, ontologies and β€œactual” terms which are found in texts. Term variation and term ambiguity have been addressed in the past but more solutions are needed. The confusion of what is a descriptor, a term, an index term accentuates the problem. Solving the terminological problem is paramount to biotext mining, as relationships linking new genes, drugs, proteins (i.e. terms) are important for effective information extraction. Mining for relationships between terms and their automatic extraction is important for the semi-automatic updating and populating of ontologies and other resources needed in biomedicine. Text mining applications such as question-answering, automatic summarization, intelligent information retrieval are based on the existence of shared resources, such as annotated corpora (GENIA) and terminological resources. The field needs more concentrated and integrated efforts to build these shared resources. In addition, evaluation efforts such as BioCreaTive, Genomic Trec are important for biotext mining techniques and applications. The aim of text mining in biology is to provide solutions to biologists, to aid curators in their task. We hope this thematic session addressed techniques and applications which aid the biologists in their research.

Constructing SSML Documents with Automatically Generated Intonation Information in a Combinatory Categorial Grammar Framework

Lee Hwa Jin, Ho-Joon Lee, and Jong C. Park
International Journal of Computer Processing of Oriental Languages (IJCPOL), Vol. 17, No. 4, pp. 223-238, December, 2004.
Show abstract
As of now, Text-to-Speech (TTS) systems are widely used throughout the full spectrum of our activities, and various natural language processing techniques have been utilized to enhance the performance of such TTS systems. As TTS systems begin to play an important role for communication between human and machine, naturalness is considered the most crucial measure of performance for TTS systems, in addition to correctness. General statistical approaches, though widely adopted, are not appropriate for the phenomena as they assign the same intonation to the same sentence. We analyze various kinds of corpus to extract informative features for intonation generation in a Combinatory Categorial Grammar framework, and express intonation-annotated document using Speech Synthesis Markup Language for target system neutral application.

Emotion Prediction from Natural Language Documents with Emotion Network

Hye-Jin Min and Jong C. Park
Proceedings of HLT, pp. 191-199, Ulsan, October, 2004.
Show abstract
λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν…μŠ€νŠΈμ— λ‚˜νƒ€λ‚œ κ°μ •μƒνƒœλ₯Ό μΈμ§€ν•˜λŠ” λͺ¨λΈμ„ μ œμ•ˆν•˜κ³ , μ΄λŸ¬ν•œ λͺ¨λΈμ„ ν™œμš©ν•˜μ—¬ ν˜„μž¬λ¬Έμž₯μ—μ„œ λ‚˜νƒ€λ‚œ 감정 및 이후에 λ‚˜νƒ€λ‚˜κ²Œ 될 κ°μ •μƒνƒœλ“€μ„ μ˜ˆμΈ‘ν•˜λŠ” μ‹œμŠ€ν…œμ— λŒ€ν•˜μ—¬ 닀룬닀. μ‚¬μš©μžμ˜ 감정을 μΈμ§€ν•˜κ³  이에 λŒ€ν•œ μžμ—°μŠ€λŸ¬μš΄ λ©”μ‹œμ§€, 행동 등을 톡해 인간과 μƒν˜Έμž‘μš© ν•  수 μžˆλŠ” μ»΄ν“¨ν„°μ‹œμŠ€ν…œμ„ κ΅¬ν˜„ν•˜κΈ° μœ„ν•΄μ„œλŠ” ν˜„μž¬μ˜ κ°μ •μƒνƒœλΏλ§Œ μ•„λ‹ˆλΌ μ‚¬μš©μž 개개인의 정보 및 μ‹œμŠ€ν…œκ³Ό μƒν˜Έμž‘μš©ν•˜κ³  μžˆλŠ” μƒν™©μ˜ 정보 등을 톡해 μ΄ν›„μ—μ‚¬μš©μžκ°€ λŠλ‚„ 수 μžˆλŠ” 감정을 μ˜ˆμΈ‘ν•  수 μžˆλŠ” 감정λͺ¨λΈμ΄ μš”κ΅¬λœλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” νŒŒμ•…λœ μ΄μ „μ˜ κ°μ •μƒνƒœ 및 μ‹€μ œ 감정과 ν‘œν˜„λœ κ°μ •κ°„μ˜ 관계, 그리고 감정에 영ν–₯을 미친 μ£Όλ³€λŒ€μƒμ˜ νŠΉμ§• 및 κ°μ •κ²½ν—˜μžμ˜ λͺ©ν‘œμ™€ 행동이 반영된 μƒνƒœ-μ „μ΄ν˜•νƒœμ˜ 감정λͺ¨λΈμΈ 감정망(Emotion Network)을 μ œμ•ˆν•œλ‹€. 감정망은 각 감정을 λ‚˜νƒ€λ‚΄λŠ” μƒνƒœ(state)와 μ—°κ²°λœ μƒνƒœλ“€ κ°„μ˜ 전이(transition), 그리고 전이가 λ°œμƒν•˜κΈ° μœ„ν•œ 쑰건(condition)으둜 κ΅¬μ„±λœλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν…μŠ€νŠΈ ν˜•νƒœμ˜ μƒλ‹΄μ˜ˆμ‹œμ— 감정망을 ν™œμš©ν•˜μ—¬ λ¬Έν—Œμ˜ κ°μ •μ–΄νœ˜μ— μ˜ν•΄ μ§μ ‘μ μœΌλ‘œ ν‘œμΆœλ˜μ§€ μ•ŠλŠ” 감정을 μ˜ˆμΈ‘ν•  수 μžˆμŒμ„ 보인닀.

Identification and Recovery of Elided Information for Text Animation

Eunyoung Chang and Jong C. Park
Proceedings of HLT, pp. 94-102, Ulsan, October, 2004.
Show abstract
μŒμ„±μΈμ‹κΈ°μˆ μ„ μ‹€μ œ μƒν™œμ— μ μš©ν•  λ•Œ λ°œμƒν•˜λŠ” λŒ€ν‘œμ μΈ 문제둜, μΈμ‹κΈ°μ˜ λ‚은 인식λ₯ λ‘œ μΈν•œ μ˜€λ™μž‘μ„ λ“€ 수 μžˆλ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ”, ν…”λ ˆλ±…ν‚Ή λ„λ©”μΈμ—μ„œμ˜ HTK(Hidden Markov Model Toolkit) 연속 μŒμ„± 인식 μ‹œμŠ€ν…œκ³Ό, μ΅œλŒ€ μ—”νŠΈλ‘œν”Ό 기법에 κΈ°λ°˜ν•œ μ‚¬μš©μž λ°œν™”μ—μ„œμ˜ 핡심이 λ˜λŠ” 단어(주둜 고유 λͺ…사듀)듀에 λŒ€ν•œ 인식 μ‹ λ’°λ„μ˜ μΈ‘μ • 방법을 μ œμ‹œν•œλ‹€. 음ν–₯νŠΉμ§•κ³Ό μ–Έμ–΄νŠΉμ§•λ“€μ„ λͺ¨λ‘ κ³ λ €ν•˜μ—¬ 인식 신뒰도λ₯Ό κ΅¬ν•˜μ˜€μœΌλ©° μΈμ‹λœ 단어듀에 λŒ€ν•΄ μ˜€μΈμ‹ λ˜μ—ˆμŒμ„ μ•½ 86%의 μ •ν™•λ„λ‘œ νŒλ‹¨ν•  수 μžˆμŒμ„ 확인 ν•˜μ˜€λ‹€. λ³Έ 인식신뒰도λ₯Ό μ΄μš©ν•˜μ—¬ 차후에 μŒμ„±μΈμ‹μ˜ ν™•μΈλŒ€ν™”(Clarification Dialog)λͺ¨λΈμ„ κ°œλ°œν•˜λŠ”λ° ν™œμš©ν•˜κ³ μž ν•œλ‹€.

Constructing VoiceXML documents with Contextually Appropriate Intonation from Natural Language Dialogues in a Combinatory Categorial Grammar framework

Lee Hwa Jin, Ho-Joon Lee, and Jong C. Park
Proceedings of the 5th China-Korea Joint Symposium on Oriental Language Processing and Pattern Recognition, pp. 2-9, Qingao, P.R.China, February 25-27, 2004.
Show abstract
Various natural language processing techniques have been utilized to enhance the performance of the Text-to-Speech (TTS) systems to date. Correctness and naturalness are among the working measures for the performance of these systems, where the usual proposals to satisfy the second measure have employed statistic prediction methods to find appropriate intonation for a given sequence of words in a sentence. However, these proposals tend to assign the same intonation to the same word sequence in a sentence, whereas people may associate quite different kinds of intonation with the same word sequence in a sentence depending upon the context in which the sentence is expressed. In this paper, we use a combinatory categorial grammar approach to synthesizing contextually appropriate intonation for dialogues in Korean, taking into account the distinguishing characteristics as identified from the speech corpus. The intonation-annotated dialogues are then translated into corresponding VoiceXML documents, which work as direct inputs to a TTS system for the generation of actual speech data.

Anaphora Resolution in Text Animation

Kyung Wha Hong and Jong C. Park
Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, pp. 347-352, Innsbruck, Austria, February, 2004.
Show abstract
For effective text animation from natural language stories, the source sentences in natural language should be processed not only individually but also as a coherent story as a whole. In particular, it is important that anaphoric expressions are interpreted adequately, since they provide crucial clues for the overall behaviors of story line characters. In text understanding, the task of anaphora resolution has been primarily on nominal expressions. In text animation, however, there are many other important candidates for anaphoric expressions, including those for actions and events, in addition to objects. In this paper, we provide an analysis of sample fairy tales, and present a classification for the types of anaphoric expressions for text animation. We also describe an implemented text animation system with anaphora resolution.

BioIE: Retargetable Information Extraction and Ontological Annotation of Biological Pathways from the Literature

Jung-jae Kim and Jong C. Park
Journal of Bioinformatics and Computational Biology (JBCB), Vol. 2, No. 3, pp. 551-568, 2004. (SCI IF 1.393)
Show abstract
The need for extracting general biological interactions of arbitrary types from the rapidly growing volume of the biomedical literature is drawing increased attention, while the need for this much diversity also requires both a robust treatment of complex linguistic phenomena and a method to consistently characterize the results. We present a biomedical information extraction system, BioIE, to address both of these needs by utilizing a full-fledged English grammar formalism, or a combinatory categorial grammar, and by annotating the results with the terms of Gene Ontology, which provides a common and controlled vocabulary. BioIE deals with complex linguistic phenomena such as coordination, relative structures, acronyms, appositive structures, and anaphoric expressions. In order to deal with real-world syntactic variations of ontological terms, BioIE utilizes the syntactic dependencies between words in sentences as well, based on the observation that the component words in an ontological term usually appear in a sentence with known patterns of syntactic dependencies.

Case Study: Visualization and Analysis of Mitogen-Activated Protein Kinase Pathways in the Literature

Changsu Lee, Jinah Park, and Jong C. Park
Conference on Visualization and Data Analysis (VDA), pp. 275-285, San Jose, USA, Janurary, 2004.
Show abstract
Data sets of up to 3000 journal abstracts from MEDLINE literature on the keyword combination 'MAPK pathway' and 'human' are visualized and analyzed for mitogen-activated protein kinase (MAPK) pathways. We have tightly coupled exploratory visualization with information extraction for interactive navigation through scattered information sources, in search of useful facts on MAPK by frequency-based filtering and amplification Unlike direct database visualization that operates on curated data sets, literature visualization has the advantages of manipulating data sets of a massive scale with a lot less manpower and effectively responding to the fast cycles of the developments in the field.

BioAR: Anaphora Resolution for Relating Protein Names to Proteome Database Entries

Jung-jae Kim and Jong C. Park
ACL Workshop on Reference Resolution and its Applications, pp. 79-86, Barcelona, Spain, 2004.
Show abstract
The need for associating, or grounding, protein names in the literature with the entries of proteome databases such as Swiss-Prot is well-recognized. The protein names in the biomedical literature show a high degree of morpholog- ical and syntactic variations, and various anaphoric expressions including null anaphors. We present a biomedical anaphora resolution system, BioAR, in order to address the variations of protein names and to further associate them with Swiss-Prot entries as the actual entities in the world. The system shows the performance of 59.5%βœ‚75.0% precision and 40.7%βœ‚56.3% recall, depending on the specific types of anaphoric expressions. We apply BioAR to the protein names in the biological interactions as extracted by our biomedical information extraction system, or BioIE, in order to construct protein pathways automatically.

Annotation of Gene Products in the Literature with Gene Ontology Terms using Syntactic Dependencies

Jung-jae Kim and Jong C. Park.
Proceedings of the 1st International Joint Conferrence on Natural Language Processing (IJCNLP), pp. 528-534, Hainan, P.R.China, 2004.
Show abstract
We present a method for automatically annotating gene products in the literature with the terms of Gene Ontology (GO), which provides a dynamic but controlled vocabulary. Although GO is well-organized with such lexical relations as synonymy, β€˜is-a’, and β€˜part-of’ relations among its terms, GO terms show quite a high degree of morphological and syntactic variations in the literature. As opposed to the previous approaches that considered only restricted kinds of term variations, our method uncovers the syntactic dependencies between gene product names and ontological terms as well in order to deal with real-world syntactic variations, based on the observation that the component words in an ontological term usually appear in a sentence with established patterns of syntactic dependencies.

Research Trends in Bio Text Mining

Jung-jae Kim and Jong C. Park.
Korea Information Science Society SIGBIT News Letter, Vol. 2, No. 1, pp. 14-31, 2004.

An Analysis of Syntactic and Semantic Relations between Negative Polarity Items and Negatives in Korean

Jung-jae Kim and Jong C. Park.
Journal of Language and Information, Vol. 8, No. 1, pp. 53-76, 2004.
Show abstract
Negative polarity items (NPIs), which function as quantifiers, are licensed in a syntactically strict way by negatives, which function as qualifiers, resulting in universal negating interpretations as pairs. We present a proposal to explain the related phenomena, in which the syntax and the semantics are closely related to each other, with Combinatory Categorial Grammar. For this purpose, we first adopt the usual approach to scrambling, but control its overgeneration with the use of markers, taking into account the complex syntactic phenomena involving NPIs and scrambling in Korean. We also propose to utilize polarity intensity as a novel feature, in order to account for the universal negating interpretations when NPIs are combined with negatives. Our proposal also explains the difference in readings when other quantifiers or qualifiers intervene the NPIs and the related negatives. (Korea Advanced Institute of Science and Technology)

Automatic Camera Control for Automated Digital Cinematography from Text

Semin Jang and Jong C. Park
Proceedings of the 31th KISS Spring Conference, Vol. 31, No. 1(B), pp. 904-906, KAIST, Korea, 2004.
Show abstract
μ˜ν™”λ₯Ό μ œμž‘ν•˜λŠ” 과정에 ν•„μˆ˜μ μœΌλ‘œ μ‚¬μš©λ˜κ³  μžˆλŠ” λŒ€λ³Έ(θ‡Ίζœ¬)μ—λŠ” ν•„μš”ν•œ λΆ€λΆ„λ§ˆλ‹€ μ˜μƒκΈ°λ²•μ΄ λͺ…μ‹œλ˜μ–΄ μžˆμ–΄μ„œ μ‹€μ œ μž₯면을 κ΅¬ν˜„ν•˜λŠ” 과정에 μ›μž‘μžκ°€ μ˜λ„ν•˜λŠ” 상황을 비ꡐ적 μ •ν™•ν•˜κ²Œ μž¬ν˜„ν•˜λŠ” 것이 κ°€λŠ₯ν•˜λ‹€. 이에 λΉ„ν•˜μ—¬ ꡐ톡사고 μ‚¬κ±΄λ³΄κ³ μ„œλ‚˜ 동화 등을 기반으둜 디지털 μ˜μƒμ„ μžλ™μœΌλ‘œ μ œμž‘ν•˜λ €λŠ” 경우 μ΄λŸ¬ν•œ μ˜μƒκΈ°λ²•μ΄ λͺ…μ‹œλ˜μ–΄ μžˆμ§€ μ•Šλ‹€. κ·ΈλŸ¬λ―€λ‘œ μžμ—°μ–Έμ–΄λ‘œ 기술된 자료둜 λΆ€ν„° 디지털 μ˜μƒμ„ μžλ™μœΌλ‘œ μ œμž‘ν•˜κΈ° μœ„ν•΄μ„œλŠ” μž‘κ°€μ˜ μ˜λ„λ₯Ό νŒŒμ•…ν•˜μ—¬ μ μ ˆν•œ μ˜μƒκΈ°λ²•μ„ μΆ”μΆœ ν•˜λŠ” λ°©μ•ˆμ΄ μžˆμ–΄μ•Ό ν•œλ‹€. λ³Έ λ…Όλ¬Έμ˜ μ„ ν–‰ μ—°κ΅¬μ—μ„œλŠ” 동화λ₯Ό λŒ€μƒμœΌλ‘œ ν•˜λŠ” μ• λ‹ˆλ©”μ΄μ…˜ μžλ™ 생성을 μœ„ν•΄μ„œ μ‹œκ°„ 관리, μ°Έμ‘° ν•΄κ²°, μœ„μΉ˜ μ„€μ •, μ„ΈλΆ€ λͺ…λ Ή κ²°μ • 및 λ‹€μˆ˜ 캐릭터 μ œμ–΄ λ“±μ˜ μš”μ†Œ 기술이 ν•„μš”ν•˜λ‹€λŠ” 것을 보이고 특히 μ‹œκ°„ 관리 μ€‘μ—μ„œ μ μ ˆν•œ μž₯λ©΄μ „ν™˜μ΄ ν•„μš”ν•œ 경우λ₯Ό μžλ™μœΌλ‘œ νŒŒμ•…ν•˜λŠ” λ°©μ•ˆμ„ μ œμ‹œν•˜μ˜€λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 결합범주문법을 μ‚¬μš©ν•˜μ—¬ 동화 λ¬Έμž₯에 λ‚˜νƒ€λ‚˜λŠ” μž‘κ°€μ˜ μ˜λ„λ₯Ό λΆ„μ„ν•˜κ³ , 이에 λΆ€ν•©ν•˜λŠ” λ‹€μ–‘ν•œ 카메라 μš΄μš©κΈ°λ²•μ„ μžλ™μœΌλ‘œ νŒŒμ•…ν•˜μ—¬ μ μš©ν•œ 디지털 μ˜μƒ μ œμž‘ λ°©μ•ˆμ„ μ œμ‹œν•˜κ³  κ΅¬ν˜„ν•œ μ‹œμŠ€ν…œμ„ 보인닀.

Automatic Generation of Multimedia Therapeutic Contents with Combinatory Categorial Grammar

Hye-Jin Min and Jong C. Park
HCI/CG/VR/UI/DESIGN, Phoenix Park, 2004.
Show abstract
μΈν„°λ„·μ˜ λ°œλ‹¬λ‘œ λŒ€μ•ˆμ μΈ μ‹¬λ¦¬μΉ˜λ£Œ 방법이라 ν•  수 μžˆλŠ” μƒλ‹΄μΉ˜λ£Œ, μŒμ•…μΉ˜ 료 및 λ―Έμˆ μΉ˜λ£Œκ°€ 개인의 고민을 상담해 μ£ΌλŠ” 인터넷 μ‚¬μ΄νŠΈμ—μ„œ ν™œλ°œνžˆ 제 곡되고 μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” λ‚΄λ‹΄μžμ˜ 고민이 λ‹΄κΈ΄ 글을 μžλ™μœΌλ‘œ λΆ„μ„ν•˜μ—¬ λ‚΄λ‹΄μžμ˜ 감정 μƒνƒœμ™€ 고민의 원인 정보λ₯Ό νŒŒμ•…ν•˜μ—¬ κΈ€, κ·Έλ¦Ό, μŒμ•… 등이 톡 ν•©λœ λ©€ν‹°λ―Έλ””μ–΄ 치료 정보λ₯Ό μƒμ„±ν•˜λŠ” 과정을 보인닀. λ©€ν‹°λ―Έλ””μ–΄ 치료 μ • λ³΄λŠ” ν•΄λ‹Ή κ°μ •μ˜ ν•΄μ†Œμ— 도움을 쀄 수 μžˆλŠ” ν…μŠ€νŠΈ, 이미지 및 μŒμ•…νŒŒμΌμ΄ 심리적인 치료의 λͺ©μ μœΌλ‘œ 검색어와 ν•¨κ»˜ κ΅¬μ‘°ν™”λ˜μ–΄ μžˆλŠ” 정보λ₯Ό μ§€μΉ­ν•œλ‹€. λ©€ν‹°λ―Έλ””μ–΄ 치료 정보λ₯Ό κ΅¬μΆ•ν•˜κΈ° μœ„ν•œ 검색어λ₯Ό μžλ™μœΌλ‘œ μƒμ„±ν•˜κΈ° μœ„ν•΄μ„œ λŠ” λ¬Έμž₯μ—μ„œ 고민에 κ΄€λ ¨λ˜λŠ” λ‚΄λ‹΄μžμ˜ κ°μ •ν‘œν˜„ 방식 및 의미 관계, 그리고 ν•΄λ‹Ή κ°μ •μ˜ κ²½κ³Ό μ‹œκ°„ 정보 등을 적절히 뢄석해내야 ν•˜λ―€λ‘œ, ν‚€μ›Œλ“œμ— 따라 이에 λ§žλŠ” 감정을 λŒ€μ‘μ‹œν‚€κ±°λ‚˜ 상식을 μ΄μš©ν•˜μ—¬ μΆ”λ‘ ν•˜λŠ” 방법을 ν™œμš©ν•˜μ—¬ 감정 정보λ₯Ό μΆ”μΆœν•˜λŠ” 기쑴의 μ—°κ΅¬μ—μ„œλŠ” 닀루지 μ•Šμ•˜λ˜ 좔가적인 언어적 특 성듀이 보닀 μ‹¬λ„μžˆκ²Œ κ³ λ €λ˜μ–΄μ•Ό ν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 이λ₯Ό μœ„ν•˜μ—¬ 내포문 μ΄λ‚˜ 접속문과 같은 ν•˜μœ„λ¬Έμ˜ 주어와 μƒμœ„λ¬Έμ˜ μ£Όμ–΄κ°€ μ„œλ‘œ κ°€μ§€λŠ” 관계λ₯Ό μžλ™μœΌλ‘œ νŒŒμ•…ν•˜κ³ , 각 동사가 의미적으둜 μš”κ΅¬ν•˜λŠ” λ¬Έμž₯μ„±λΆ„μ˜ 성격에 따라 κ°μ •μ˜ κ²½ν—˜μ£Ό 및 ν‘œν˜„μ˜ λŒ€μƒμ„ ν™•μΈν•˜λ©° μ‹œκ°„λΆ€μ‚¬λ‘œ κ°μ •λ³€ν™”μƒνƒœλ₯Ό νŒŒμ•… ν•˜λŠ” λ“±μ˜ μžμ—°μ–Έμ–΄μ²˜λ¦¬ 과정을 결합범주문법을 ν†΅ν•˜μ—¬ κ΅¬ν˜„ν•¨μœΌλ‘œμ¨ 이듀 λ¬Έμž₯에 λ‚˜νƒ€λ‚˜ μžˆλŠ” μ‹¬λ¦¬μƒνƒœμ— λŒ€μ‘ν•˜λŠ” 치료 정보λ₯Ό κ΅¬μ‘°ν™”λœ 데이터베이 μŠ€λ‘œλΆ€ν„° κ²€μƒ‰ν•˜μ—¬ λ©€ν‹°λ―Έλ””μ–΄ 치료 정보λ₯Ό μƒμ„±ν•˜λŠ” 과정을 보인닀.

Data-oriented Customized Visual Navigation

Changsu Lee, Jinah Park, and Jong C. Park
HCI/CG/VR/UI/DESIGN, Phoenix Park, 2004.
Show abstract
μ €μž₯ 맀체의 λ°œλ‹¬ 및 정보 기술의 λ°œλ‹¬λ‘œ μΈν•΄μ„œ λΉ λ₯΄κ²Œ λŠ˜μ–΄λ‚˜λŠ” κ°€μš© ν•œ μ •λ³΄μ˜ λ°©λŒ€ν•œ 양은 μ‚¬μš©μžμ˜ 정보에 λŒ€ν•œ 이해λ₯Ό μ–΄λ ΅κ²Œ λ§Œλ“ λ‹€. μ •λ³΄μ˜ μ›μ²œμœΌλ‘œλΆ€ν„° μ •λ³΄μ˜ μ—¬κ³Ό, μ •λ³΄μ˜ ν‘œν˜„μœΌλ‘œ μ΄μ–΄μ§€λŠ” 일련의 정보 ν™œμš© κ³Ό μ •μ—μ„œ, μ‚¬μš©μž κ°œλ³„ν™”μ— λŒ€ν•œ 기쑴의 μ—°κ΅¬λŠ” 일반적으둜 μ •λ³΄μ˜ μ—¬κ³Ό μͺ½μ— μ„œλ§Œ 이루어져 μ™”λ‹€. ν•˜μ§€λ§Œ μ‚¬μš©μžμ™€ κ°€κΉκ²Œ μƒν˜Έ μž‘μš©μ„ ν•˜λŠ” μ •λ³΄μ˜ ν‘œν˜„ λΆ€λΆ„μ—μ„œ μ‚¬μš©μž κ°œλ³„ν™”κ°€ κ°€λŠ₯해지면, μ‚¬μš©μžλŠ” μžμ‹ μ˜ λͺ©μ μ— λΆ€ν•©ν•˜λŠ” μ • 보λ₯Ό μ–»λŠ” 과정을 λ”μš± μ„Έλ°€ν•˜κ²Œ μ‘°μ ˆν•  수 μžˆλ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” μ‚¬μš©μž 개 별화 κΈ°λŠ₯을 κ°–μΆ˜ 적극적 μ—­ν• μ˜ μ‹œκ°ν™” μ‹œμŠ€ν…œμ„ μ œμ•ˆν•œλ‹€. μ‚¬μš©μž κ°œλ³„ν™” κΈ°λŠ₯은 λ°μ΄ν„°μ˜ νŠΉμ„±μ— κΈ°λ°˜ν•œ λΆ„λ₯˜λ²•μ„ μ‚¬μš©ν•˜μ—¬ κ΅¬ν˜„ν•˜μ˜€λ‹€. λ³Έ μ—°κ΅¬μ—μ„œ λŠ” 생물학을 적용 λ„λ©”μΈμœΌλ‘œ ν•˜μ—¬, λΆ„μžκ°„ μƒν˜Έ μž‘μš© λ°μ΄ν„°μ˜ νŠΉμ„±μ— 따라 데이터λ₯Ό λΆ„λ₯˜ν•˜λŠ” 방법을 μ œμ•ˆν•˜λ©°, μ‹€ν—˜μ„ ν†΅ν•˜μ—¬ μ‚¬μš©μžλ³„λ‘œ κ°œλ³„ν™”λœ λΆ„ μžκ°„ μƒν˜Έ μž‘μš© 지도λ₯Ό 효과적으둜 얻을 수 μžˆμŒμ„ ν™•μΈν•œλ‹€.

Natural Language Response Generation from Relational Database Query Result

Ji-yong Jung and Jong C. Park
HCI/CG/VR/UI/DESIGN, Phoenix Park, 2004.
Show abstract
μžμ—°μ–Έμ–΄ 질의/응닡 μΈν„°νŽ˜μ΄μŠ€λŠ” μ‚¬μš©μžκ°€ νŠΉλ³„ν•œ 지식이 없어도 μ‹œμŠ€ ν…œμ— μ‰½κ²Œ μ ‘κ·Όν•  수 μžˆλ„λ‘ ν•˜μ—¬, μ •λ³΄μ˜ μ œκ³΅μ„ 쉽고 μžμ—°μŠ€λŸ½κ²Œ ν•œλ‹€. κ·Έ λŸ¬λ‚˜ 이에 λŒ€ν•œ 기쑴의 μ—°κ΅¬λŠ” λŒ€λΆ€λΆ„μ΄ μžμ—°μ–Έμ–΄λ₯Ό SQLκ³Ό 같은 데이터베이 슀 접근을 μœ„ν•œ ν˜•μ‹μ–Έμ–΄λ‘œ λ°”κΎΈλŠ” 데 μ΄ˆμ μ„ λ§žμΆ”κ³  있고, μ§ˆμ˜λ‘œλΆ€ν„° μ–»μ–΄ 진 κ²°κ³Όλ₯Ό μ μ ˆν•˜κ²Œ ν‘œν˜„ν•˜λŠ” 응닡 생성에 μžˆμ–΄μ„œλŠ” 아직 만쑱슀러운 κ²°κ³Όλ₯Ό λ§Œλ“€μ–΄λ‚΄μ§€ λͺ»ν•˜κ³  μžˆλ‹€. μžμ—°μ–Έμ–΄ 응닡 생성을 μœ„ν•΄μ„œλŠ” μ‚¬μš©μžκ°€ μ•Œκ³  있 λŠ” 정보, λ°μ΄ν„°λ² μ΄μŠ€ λ‚΄μž₯ 정보, 그리고 μ‚¬μš©μžκ°€ 질의λ₯Ό ν•¨μœΌλ‘œμ¨ μ–»κ³ μž ν•˜λŠ” 정보가 λ³΅ν•©μ μœΌλ‘œ κ³ λ €λ˜μ–΄μ•Ό ν•œλ‹€. λ˜ν•œ μ‚¬μš©μžκ°€ κΈ°λŒ€ν•˜λŠ” ν˜•νƒœμ˜ 응닡을 μƒμ„±ν•˜κΈ° μœ„ν•΄μ„œλŠ” μ‚¬μš©μžκ°€ μ›ν•˜λŠ” μ‘λ‹΅ν˜•νƒœλ₯Ό 사전에 λͺ¨λΈλ§ν•˜κ³  κ°€μž₯ μ„ ν˜Έλ˜λŠ” μ‘λ‹΅ν˜•νƒœλ₯Ό μ‚¬μš©ν•΄μ•Ό ν•œλ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” μ‚¬μš©μžμ˜ μ§ˆμ˜λ‘œλΆ€ ν„° 얻어진 κ΄€κ³„ν˜• λ°μ΄ν„°λ² μ΄μŠ€ 검색 결과에 λŒ€ν•΄ 질의의 μ˜λ„μ— 맞게 κ°œλ³„ ν™”λœ 응닡을 μƒμ„±ν•˜λŠ” 과정을 닀룬닀. μ μ ˆν•œ 응닡 생성을 μœ„ν•΄μ„œ μ—¬ν–‰μƒν’ˆ 정보에 λŒ€ν•œ μ‚¬μš©μžμ˜ 질의/응닡 μ½”νΌμŠ€λ₯Ό μ •λ³΄μ˜ λ‚΄μš© 및 λΆ„λŸ‰ μΈ‘λ©΄μ—μ„œ λΆ„ μ„ν•œ κ²°κ³Όλ₯Ό 보이고, 이에 따라 λ‚΄μš©κ³„νš, λ¬Έμž₯ ν˜•νƒœ ꡬ성, μ–΄νœ˜ ν‘œν˜„μ˜ μ„Έ 단계λ₯Ό κ±°μΉ˜λŠ” λ¬Έμž₯ 생성 방법을 μ œμ•ˆν•œλ‹€.

Contextual Disambiguation of Adverbial Scopes in Korean for Text Animation

Eunyoung Chang, Kyung Wha Hong, and Jong C. Park
HCI/CG/VR/UI/DESIGN, Phoenix Park, 2004.
Show abstract
μžμ—° μ–Έμ–΄ λ¬Έμž₯으둜 κ΅¬μ„±λœ ν…μŠ€νŠΈλ₯Ό μ• λ‹ˆλ©”μ΄μ…˜μœΌλ‘œ μžλ™ μƒμ„±ν•˜κΈ° μœ„ν•΄ μ„œλŠ” λ¬Έμž₯의 톡사 정보, 의미 정보, λ‹΄ν™” 정보듀을 λ°”νƒ•μœΌλ‘œ 일련의 μ• λ‹ˆλ©”μ΄ μ…˜ λͺ…령듀을 λ„μΆœν•΄ λ‚΄μ•Ό ν•œλ‹€. λΆ€μ‚¬λŠ” μ΄λŸ¬ν•œ λ¬Έμž₯λ“€μ—μ„œ ν•΄λ‹Ή μ• λ‹ˆλ©”μ΄μ…˜ λͺ…λ Ήμ˜ 속성 λ³€ν™” 정도λ₯Ό κ²°μ •ν•˜λ©° λΆ€μ‚¬μ˜ λ‹€μ–‘ν•œ μˆ˜μ‹ λŒ€μƒκ³Ό 의미의 μ •ν™• ν•œ 해석은 ν…μŠ€νŠΈμ˜ μ˜λ„λ₯Ό 효과적으둜 λ°˜μ˜ν•˜λŠ” μ€‘μš”ν•œ 역할을 ν•˜κ²Œ λœλ‹€. κ·ΈλŸ¬λ‚˜ λΆ€μ‚¬μ˜ μˆ˜μ‹ λŒ€μƒ λ²”μœ„κ°€ 맀우 λ„“κ³  κ·Έ μ˜λ―Έλ„ λ‹€μ–‘ν•˜μ—¬, λ‚΄ν¬μ ˆμ΄λ‚˜ 병렬ꡬ쑰λ₯Ό ν¬ν•¨ν•˜λŠ” λ³΅μž‘ν•œ λ¬Έμž₯μ—μ„œλΏλ§Œ μ•„λ‹ˆλΌ λ‹¨λ¬Έμ—μ„œλ„ λΆ€μ‚¬μ˜ κΈ°λŠ₯을 μ •ν™•νžˆ νŒŒμ•…ν•˜λŠ” 것이 μš©μ΄ν•˜μ§€ μ•Šλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ •ν™•ν•œ ν…μŠ€νŠΈ μ• λ‹ˆλ©” μ΄μ…˜μ„ μœ„ν•œ λΆ€μ‚¬μ˜ 뢄석방법을 μ œμ•ˆν•˜κ³  κ·Έ 처리 κ²°κ³Όλ₯Ό 보인닀. ν˜„μž¬ 이루 μ–΄μ Έ μžˆλŠ” ν•œκ΅­μ–΄ 뢀사에 λŒ€ν•œ μ—°κ΅¬λŠ” 주둜 톡계 기반 ν•™μŠ΅μœΌλ‘œ 뢀사와 ν”Όμˆ˜ μ‹μ–΄μ™€μ˜ ν˜Έμ‘μ„±μ„ ν™œμš©ν•˜μ—¬ ꡬ쑰의 애맀성을 μ²˜λ¦¬ν•˜κ³  μžˆμ„ 뿐 μ•„λ‹ˆλΌ, λΆ€ μ‚¬μ˜ μœ„μΉ˜ μ œμ•½ 정보 쀑 극히 μΌλΆ€λ§Œμ„ μ΄λŸ¬ν•œ ν˜Έμ‘ 관계에 λŒ€ν•œ μ œμ•½ 정보 둜 ν™œμš©ν•˜κ³  μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ΄λŸ¬ν•œ 정보에 λ¬Έλ§₯ 정보λ₯Ό 같이 κ³ λ €ν•˜μ—¬ ꡬ쑰적 애맀성을 ν•΄κ²°ν•˜κ³  보닀 μ •ν™•ν•œ 의미λ₯Ό λ„μΆœν•˜κ³ μž ν•œλ‹€. λ³Έ 논문에 μ„œλŠ” λΆ€μ‚¬μ˜ 톡사적, 의미적 뢄석 방법을 μ œμ•ˆν•˜κΈ° μœ„ν•΄μ„œ 결합범주문법을 μ‚¬μš©ν•˜μ˜€κ³ , 이λ₯Ό ν™•μž₯ν•˜μ—¬ νŒŒμƒλΆ€μ‚¬, 뢀사ꡬ, λΆ€μ‚¬μ ˆ λ“±μ˜ λ³΅μž‘ν•œ 뢀사어 ꡬ 문에 λŒ€ν•΄μ„œλ„ λ¬Έλ²•μ μœΌλ‘œ μ²˜λ¦¬ν•  수 μžˆλŠ” λ°©μ•ˆμ„ μ œμ‹œν•œλ‹€. 그리고 μ΄λ ‡κ²Œ μ œμ‹œλœ λ°©μ•ˆμ„ κ΅¬ν˜„ν•œ ν…μŠ€νŠΈ μ• λ‹ˆλ©”μ΄μ…˜ μ‹œμŠ€ν…œμ„ ν†΅ν•˜μ—¬ μ• λ‹ˆλ©”μ΄μ…˜ 생성 κ²°κ³Όλ₯Ό ν™•μΈν•œλ‹€.

Automated Digital Cinematography with Natural Language Processing

Semin Jang
MS thesis, KAIST, 2004.

Automatic Translation of Korean into Korean Sign Language with Combinatory Categorial Grammar

Jiwon Choi
MS thesis, KAIST, 2004.

Applications to Molecular Interactions: Customized Visualization for Knowledge Discovery with Information Extraction

Changsu Lee
MS thesis, KAIST, 2004.
(Outstanding M.S. Thesis Award, 2004. 2.)

Kyung Wha Hong, Anaphora Resolution for Contextually Appropriate Text Animation

Kyung Wha Hong
MS thesis, KAIST, 2004.

Annotation of Gene Products in the Literature with Gene Ontology Terms using Syntactic Dependencies

Jung-jae Kim and Jong C. Park
Lecture Notes on Artificial Intelligence, Post-Conference Book of IJCNLP-04, 2004.
Show abstract
We present a method for automatically annotating gene products in the literature with the terms of Gene Ontology (GO), which provides a dynamic but controlled vocabulary. Although GO is well-organized with such lexical relations as synonymy, β€˜is-a’, and β€˜part-of’ relations among its terms, GO terms show quite a high degree of morphological and syntactic variations in the literature. As opposed to the previous approaches that considered only restricted kinds of term variations, our method uncovers the syntactic dependencies between gene product names and ontological terms as well in order to deal with real-world syntactic variations, based on the observation that the component words in an ontological term usually appear in a sentence with established patterns of syntactic dependencies.

Information Visualization in 3-Dimensional Space for Text Data Mining

Jinah Park, Changsu Lee, and Jong C. Park
International Women's Conference on BIEN-Technology, Daejeon, Korea, November, 2003.

Analysis and Computational Processing of Sentences in Korean for automatic sign language Generation

Jiwon Choi and Jong C. Park
Proceedings of the National Conference on Korean Language Processing, pp. 219-226, October, 2003.
Show abstract
ν•œκ΅­ μˆ˜ν™”λŠ” ν•œκ΅­μ–΄μ— λŒ€ν•œ 기본적인 μœ μ‚¬μ„±μ„ 가지고 μžˆμ§€λ§Œ, κ΅μ°©μ–΄μ΄μž 청각-μŒμ„± 체 계 언어인 ν•œκ΅­μ–΄μ™€λŠ” 달리 κ³ λ¦½μ–΄μ΄μž μ‹œκ°-μš΄λ™ 체계 μ–Έμ–΄λ‘œμ„œμ˜ νŠΉμ„±μ„ λ™μ‹œμ— λ‚˜νƒ€λ‚΄ κ³  μžˆλ‹€. κ·ΈλŸ¬λ―€λ‘œ ν…μŠ€νŠΈ ν˜•νƒœμ˜ ν•œκ΅­μ–΄ λ¬Έμž₯μœΌλ‘œλΆ€ν„° μˆ˜ν™”λ₯Ό μžλ™ μƒμ„±ν•˜κΈ° μœ„ν•΄μ„œλŠ” ν•œ κ΅­μ–΄λ₯Ό μœ„ν•΄ 미리 μ •μ˜λœ 문법에 μˆ˜ν™” ν‘œν˜„μ„ λ¬΄λ¦¬ν•˜κ²Œ 연계 μ‹œν‚€λ €κ³  ν•˜κΈ° 보닀, μˆ˜ν™” κ³  유의 의미 전달 체계λ₯Ό λΆ„μ„ν•˜κ³  ν™œμš©ν•˜μ—¬μ•Ό ν•  ν•„μš”κ°€ μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μˆ˜ν™” ν‘œν˜„μƒ 의 언어학적 νŠΉμ§•μ„ μž¬ν˜„Β·μƒλž΅Β·λ³€ν˜•Β·μ΄λ™μ˜ λ„€ κ°€μ§€λ‘œ κ΅¬λΆ„ν•˜μ—¬ λΆ„μ„ν•˜κ³  결합범주문법을 μ΄μš©ν•œ 이 같은 ν˜„μƒμ˜ 처리 방법 및 κ΅¬ν˜„ λ°©μ•ˆμ— λŒ€ν•˜μ—¬ λ…Όμ˜ν•œλ‹€.

Towards Automatic Sign Language Generation with Combinatory Categorial Grammar

Jiwon Choi and Jong C. Park
HCI Conference, pp. 481-486, Phoenix Park, Korea, February, 2003.
Show abstract
μˆ˜ν™”λŠ” 청각 μž₯μ• μΈμ˜ μ˜μ‚¬μ†Œν†΅μ„ μœ„ν•œ μ‹œκ°μ  μ–Έμ–΄λΌλŠ” νŠΉμ§•μ„ 가지고 μžˆμ–΄ ꡬ어 λ³‘μš©μ„ μ „μ œλ‘œ ν•˜λŠ” λ‹€λ₯Έ μ–Έμ–΄μ—μ„œλŠ” μ°Ύμ•„ 보기 μ–΄λ €μš΄ λ…νŠΉν•œ 문법 ꡬ μ‘°λ₯Ό 가지고 μžˆλ‹€. κ·ΈλŸ¬λ‚˜ μˆ˜ν™”λ₯Ό μžλ™μœΌλ‘œ μ²˜λ¦¬ν•˜λ €λŠ” 기쑴의 μ—°κ΅¬μ—μ„œλŠ” ν•œκ΅­μ–΄λ₯Ό μœ„ν•˜μ—¬ 미리 μ •μ˜λœ 문법에 μˆ˜ν™” ν‘œν˜„μ„ 연계 μ‹œν‚€λ €λŠ” λ…Έλ ₯이 무 λ¦¬ν•˜κ²Œ μ„ ν–‰λ˜μ–΄ μˆ˜ν™” 고유의 의미 전달 체계λ₯Ό νŒŒμ•…ν•˜κ³  ν™œμš©ν•˜λŠ”λ° λ§Žμ€ λ¬Έμ œμ μ„ 가지고 μžˆλ‹€. 특히 μˆ˜ν™”μ—μ„œλŠ” μˆ˜λ™, μˆ˜ν˜• λ“±μ˜ μˆ˜ν™”μ†ŒλΏλ§Œ μ•„λ‹ˆλΌ λ™μ‹œμ μœΌλ‘œ ν‘œν˜„ν•˜λŠ” 기제λ₯Ό μ΄μš©ν•˜μ—¬ λ„μΉ˜λ¬Έμ—μ„œμ˜ 주어와 λͺ©μ μ–΄ 관계, 사 동과 ν”Όλ™λ¬Έμ—μ„œ 주체와 객체 관계 등을 애맀성 없이 ν‘œν˜„ν•  수 있고, 직전 에 μ§€μ •λœ 곡간 정보λ₯Ό μΌμ’…μ˜ 선행사와 같이 μ‚¬μš©ν•¨μœΌλ‘œμ¨ μ€‘λ³΅λœ ν‘œν˜„μ„ ν”Όν•˜μ—¬ 효율적인 정보 전달을 κΎ€ν•  수 μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν•œκ΅­μ–΄μ™€ 같은 μžμ—° μ–Έμ–΄ ν‘œν˜„μ„ κ²°ν•©λ²”μ£Όλ¬Έλ²•μœΌλ‘œ λΆ„μ„ν•˜λŠ” 과정을 ν†΅ν•˜μ—¬ 이듀 ν‘œν˜„μ— λŒ€ μ‘ν•˜λŠ” μ• λ‹ˆλ©”μ΄μ…˜μ„ λ™λ°˜ν•œ μˆ˜ν™” ν‘œν˜„μœΌλ‘œ μžλ™ λ²ˆμ—­ν•˜λŠ” 연ꡬλ₯Ό μˆ˜ν–‰ν•˜λŠ” 과정에 ν•„μˆ˜μ μœΌλ‘œ ν•„μš”ν•œ μš”μ†Œλ“€μ— λŒ€ν•œ 연ꡬ κ²°κ³Όλ₯Ό 보이고 μˆ˜ν™”μ—μ„œ λ‚˜νƒ€ λ‚˜λŠ” λ…νŠΉν•œ μ–Έμ–΄ ν‘œν˜„ 기법을 μΆ©λΆ„νžˆ ν™œμš©ν•˜μ—¬ 보닀 μžμ—°μŠ€λŸ¬μš΄ μˆ˜ν™” ν‘œν˜„ 을 μƒμ„±ν•˜λŠ” λ°©μ•ˆμ„ κ΅¬ν˜„κ³Ό ν•¨κ»˜ μ œμ‹œν•œλ‹€.

Anaphora Resolution and Multi-Character Control for Automatic Generation of Multimedia Fairy Tales

Kyung Wha Hong and Jong C. Park
HCI Conference, pp. 487-492, Phoenix Park, Korea, February, 2003.
Show abstract
ν•œκ΅­μ–΄μ™€ 같은 μžμ—°μ–Έμ–΄λ‘œ μž‘μ„±λœ λ¬Έμž₯의 μ—°μ†μœΌλ‘œ κ΅¬μ„±λœ λ¬Έμ„œ ν˜•νƒœμ˜ 동 ν™”λ₯Ό μž…λ ₯으둜 λ°›μ•„ λ™ν™”μ˜ λ‚΄μš©μ„ 적절히 λ°˜μ˜ν•œ μ• λ‹ˆλ©”μ΄μ…˜μ„ ν¬ν•¨ν•˜λŠ” λ©€ ν‹° 동화λ₯Ό μžλ™ μƒμ„±ν•˜κΈ° μœ„ν•΄μ„œλŠ” ν•΄λ‹Ή λ¬Έμ„œμ—μ„œ λ‚˜νƒ€λ‚˜λŠ” 각쒅 μ°Έμ‘°ν˜„μƒμ— λŒ€ν•œ μ •ν™•ν•œ 해석이 ν•„μˆ˜μ μœΌλ‘œ μš”κ΅¬λœλ‹€. 이와 같은 μ• λ‹ˆλ©”μ΄μ…˜μ„ μœ„ν•œ μ°Έ μ‘°ν˜„μƒ 해석은 λ¬Έμ„œμ˜ 이해λ₯Ό 돕기 μœ„ν•˜μ—¬ μžμ—°μ–Έμ–΄μ²˜λ¦¬ λΆ„μ•Όμ—μ„œ ν†΅μƒμ μœΌ 둜 μ—°κ΅¬λ˜κ³  μžˆλŠ” μ°Έμ‘°ν˜„μƒ ν•΄μ„μ—μ„œλ³΄λ‹€ μœ ν˜•μ μœΌλ‘œ λ‹€μ–‘ν•œ νŠΉμ„±μ„ 보인닀. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” λ©€ν‹° 동화λ₯Ό μžλ™ μƒμ„±ν•˜λŠ” 과정에 λ¬Έμž₯의 μ°Έμ‘°ν˜„μƒκ³Ό ν•¨κ»˜ λ‹€μˆ˜ μΊλ¦­ν„°μ˜ μ›€μ§μž„μ„ 적절히 κ³ λ €ν•˜μ—¬ 3 차원 가상 곡간을 μ œμ–΄ν•˜λŠ” λͺ…λ Ή 을 μƒμ„±ν•˜λŠ” μ‹œμŠ€ν…œμ— λŒ€ν•œ κ΅¬ν˜„ κ²°κ³Όλ₯Ό 보인닀. μ• λ‹ˆλ©”μ΄μ…˜μ„ μœ„ν•œ μ°Έμ‘°ν˜„ 상 해석은 μ°Έμ‘°ν‘œν˜„μ˜ μ μ ˆν•œ 선행사λ₯Ό νŒŒμ•…ν•˜λŠ” 것을 κ·Έ λͺ©μ μœΌλ‘œ ν•˜κ³  있 λŠ”λ° μΊλ¦­ν„°μ˜ λͺ…μΉ­, λ™μž‘, μ„±μ§ˆ, 사건, μ‹œκ°„ λ“±μ˜ λ‹€μ–‘ν•œ μž₯λ©΄ 정보듀에 λŒ€ ν•œ κ³ λ €κ°€ ν•„μš”ν•˜λ‹€. λ‹€μˆ˜ 캐릭터λ₯Ό λ¬Έλ§₯에 맞게 μ œμ–΄ν•˜κΈ° μœ„ν•΄μ„œλŠ” μ μ ˆν•œ μ°Έμ‘°ν•΄κ²°κ³Ό ν•¨κ»˜ λ‹€μ–‘ν•œ 지식을 ν™œμš©ν•˜μ—¬ μΊλ¦­ν„°λ“€μ˜ μžμ—°μŠ€λŸ¬μš΄ μ›€μ§μž„μ„ μ œκ³΅ν•˜λŠ” 기법이 ν•„μš”ν•˜λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 결합범주문법을 μ΄μš©ν•˜μ—¬ 동화λ₯Ό λΆ„μ„ν•œ λ’€ 이에 ν•΄λ‹Ήν•˜λŠ” Genesis 3D κ²Œμž„μ—”μ§„ μ œμ–΄ 슀크립트λ₯Ό μžλ™ μƒμ„±ν•˜ λŠ” μ‹œμŠ€ν…œμ„ 보인닀.

Mediatory Visualization for Structured Data and Textual Information

Changsu Lee, Jinah Park, and Jong C. Park
The 3rd IASTED International Conference on Visualization, Imaging, and Image Processing (VIIP 2003), pp. 926-932, Benalmadena, Spain, 2003.
Show abstract
When we visualize structured data for knowledge discovery, it is important that the users have an easy access to the source textual information, especially when the map ping from the textual information to structured data is not perfect. In this paper, we present a new method for mediatory visualization for structured data and corresponding textual information to address this problem. The two dimensional space for visualizing structured data, such as the protein-protein interaction information collected from biomedical literature by information extraction, is linked perpendicularly to, but conceptually separated from, the pairwise one dimensional space for visualizing corresponding source textual data. The users can concentrate on the information in one space but explore the information in the other space as easily as one may manipulate objects in a three dimensional space. We show that the one dimensional color-banded rods give visual clues and insights to the nature of the underlying English sentence structures, which in turn give rise to useful feedback to the interaction information in the other two dimensional space, and vice versa.

Analysis and Computational Processing of Coordination Ambiguity in Korean

Hodong Lee and Jong C. Park
Journal of Language and Information, Vol. 7, No. 2, pp. 59-79, 2003.

Analysis and Processing of Korean with Quantifier Floating

Jin-Bok Lee and Jong C. Park
Journal of Language and Information, Vol. 7, No. 1, pp. 1-22, 2003.

Logical Representation of Ontological Terminologies in Biomedical Domain

Jung-jae Kim, Jin-Bok Lee, Hye-Jin Min, Ji-yong Jung, and Jong C. Park
Proceedings of the 2nd Annual Conference of The Korean Society for Bioinformatics (KSBI 2003), pp. 79-85, Daejeon, Korea, 2003.
Show abstract
λ³Έ 논문은 λŒ€λŸ‰μ˜ μƒλ¬Όμ˜λ£ŒλΆ„μ•Ό λ¬Έμ„œμ—μ„œ λ‹¨λ°±μ§ˆ 이름을 μžλ™μœΌλ‘œ μΈμ‹ν•˜κ³  각 λ‹¨λ°±μ§ˆμ˜ 특 성을 λ¬Έμ„œμ—μ„œ μžλ™μœΌλ‘œ νŒŒμ•…ν•˜μ—¬ 기쑴의 μ˜¨ν†¨λ‘œμ§€μ™€ μ—°κ³„μ‹œν‚€λŠ” 방법을 μ œμ•ˆν•œλ‹€. μ˜¨ν†¨λ‘œ 지 μš©μ–΄κ°€ λ¬Έμ„œμ—μ„œ λ‹€μ–‘ν•œ ν˜•νƒœλ‘œ 발견되기 λ•Œλ¬Έμ—, 이듀을 논리적 ν‘œν˜„μœΌλ‘œ μžλ™ λ³€ν™˜ν•˜ κ³ , λ¬Έμ„œμ—μ„œ λ‹¨λ°±μ§ˆμ˜ νŠΉμ„±μ„ μ„€λͺ…ν•˜λŠ” λ¬Έμž₯듀을 μΆ”μΆœ 및 λΆ„μ„ν•˜μ—¬ μ˜¨ν†¨λ‘œμ§€ μš©μ–΄μ˜ 논리 적 ν‘œν˜„κ³Ό λΉ„κ΅ν•˜μ˜€λ‹€. λ¬Έμ„œμ—μ„œ λ‹¨λ°±μ§ˆ νŠΉμ„±μ„ 인식할 λ•Œ, μ•½μ–΄ 처리 및 쑰응 ν˜„μƒ ν•΄κ²° λ“± 의 μžμ—°μ–Έμ–΄μ²˜λ¦¬ 기법을 μ΄μš©ν•˜λŠ” 방법을 μ œμ•ˆν•˜μ˜€λ‹€.

Morphological Analysis of Irregular Conjugation in Korean with Micro Combinatory Categorial Grammar

Ho-Joon Lee and Jong C. Park
Proceedings of the KISS Spring Conference, pp. 531-533, 2003.
Show abstract
λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν˜•νƒœμ†Œ μˆ˜μ€€μ˜ 결합범주문법을 μ΄μš©ν•˜μ—¬ ν˜•νƒœμ†Œ 뢄석을 ν¬ν•¨ν•œ μžμ—°μ–Έμ–΄μ²˜λ¦¬μ˜ μ—¬λŸ¬ 단계λ₯Ό ν•œ λ‹¨κ³„μ˜ μœ λ„κ³Όμ •μœΌλ‘œ μ²˜λ¦¬ν•˜κ³  ν˜•νƒœμ†Œ 뢄석 λ‹¨κ³„μ—μ„œ μ¦κ°€ν•˜λŠ” 애맀성과 λ³΅μž‘λ„λ₯Ό μƒμœ„ 뢄석 λ‹¨κ³„μ˜ 정보 λ₯Ό μ‚¬μš©ν•˜μ—¬ μ€„μ΄λŠ” 방법에 λŒ€ν•΄μ„œ λ…Όν•œλ‹€. ν•œκ΅­μ–΄μ—μ„œ λ‚˜νƒ€λ‚˜λŠ” λ³΅μž‘ν•œ μ–Έμ–΄ ν˜„μƒ 쀑에 ν•˜λ‚˜μΈ μš©μ–Έμ˜ 뢈규 μΉ™ ν™œμš©μ„ ν™•λ₯  정보 뿐만 μ•„λ‹ˆλΌ μŒμš΄μ •λ³΄λ₯Ό ν¬ν•¨ν•œ 톡사 μ •λ³΄λ‚˜ 의미 정보 λ“±μ˜ μƒμœ„ 정보λ₯Ό μ‚¬μš©ν•˜μ—¬ 처리 ν•˜μ—¬λ³΄κ³  일반적인 ν˜•νƒœμ†Œ λΆ„μ„κΈ°λ‘œμ„œμ˜ λ°œμ „ κ°€λŠ₯성에 λŒ€ν•΄μ„œ μ•Œμ•„λ³Έλ‹€.

Integrated Morphological Analysis for Korean in a Combinatory Categorial Grammar Framework

Ho-Joon Lee
MS thesis, KAIST, 2003.

Word Segmentation for Korean with Syllable-Level Combinatory Categorial Grammar

Ho-Joon Lee and Jong C. Park
Proceedings of the 14th National Conference on Korean Language Processing, pp. 47-54, October, 2002.
Show abstract
ν•œκ΅­μ–΄μ˜ 띄어쓰기 ν˜„μƒμ€ λ‹¨μ–΄λ³„λ‘œ μ •ν˜•ν™”λœ 띄어쓰기λ₯Ό ν•˜λŠ” μ˜μ–΄λ‚˜ 띄어쓰기가 λ°œλ‹¬ν•˜μ§€ μ•Šμ€ 쀑 κ΅­μ–΄, μΌλ³Έμ–΄μ™€λŠ” λ‹€λ₯΄κ²Œ λ…νŠΉν•œ ν˜•νƒœλ‘œ λ°œμ „λ˜μ–΄ μ™”λ‹€. κΈ°μ‘΄μ—λŠ” 뢀뢄적인 띄어쓰기 였λ₯˜λ₯Ό λ°”λ‘œμž‘ μ•„μ£ΌλŠ” ν˜•νƒœμ˜ 연ꡬ가 많이 μ§„ν–‰λ˜μ—ˆμ§€λ§Œ μ΄μ œλŠ” λ¬ΈμžμΈμ‹μ΄λ‚˜ μŒμ„±μΈμ‹ λ“±μ˜ 연ꡬ와 κ²°ν•©ν•˜μ—¬ 띄어 μ“°κΈ°κ°€ μ™„μ „νžˆ λ¬΄μ‹œλœ λ¬Έμž₯의 띄어쓰기λ₯Ό μžλ™μœΌλ‘œ μ²˜λ¦¬ν•˜λŠ” 방법에 λŒ€ν•œ 연ꡬ가 ν™œλ°œνžˆ 진행 쀑이 λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν•œκ΅­μ–΄μ˜ 띄어쓰기 ν˜„μƒκ³Ό 띄어쓰기 볡원 방법에 λŒ€ν•œ 기쑴의 연ꡬ에 λŒ€ν•΄μ„œ μ‚΄ 펴보고 기쑴의 λ°©λ²•μœΌλ‘œλŠ” μ²˜λ¦¬ν•˜κΈ° νž˜λ“€μ—ˆλ˜ ν˜•νƒœλ₯Ό μŒμ ˆλ‹¨μœ„ κ²°ν•©λ²”μ£Όλ¬Έλ²•μœΌλ‘œ μ„€λͺ…ν•œλ‹€.

Diphone-based Intonation and VoiceXML Document Generation using Multi-Dimensional Linguistic Information

Lee Hwa Jin and Jong C. Park
Proceedings of the 24th National Conference on Korean Language Processing, pp. 69-76, Cheongju, Korea, October, 2002.

Anaphora Resolution for Contextually Appropriate Animation of Multimedia Fairy Tales

Kyung Wha Hong and Jong C. Park
Proceedings of the 24th National Conference on Korean Language Processing, pp. 317-324, Cheongju, Korea, October, 2002.
Show abstract
μ°Έμ‘°ν˜„μƒμ΄λž€ 이미 μ–ΈκΈ‰λ˜μ—ˆλ˜ ν˜Ήμ€ 이미 μ•Œκ³  μžˆλ‹€κ³  μ—¬κ²¨μ§€λŠ” 정보에 λŒ€ν•œ μž¬ν‘œν˜„μ΄λ‹€. μ°Έμ‘°ν˜„μƒμ€ μžμ—°μ–Έμ–΄μ²˜λ¦¬ λΆ„μ•Όμ—μ„œ 뿐만 μ•„λ‹ˆλΌ 인지과학, 심리학, μ² ν•™λΆ„μ•Όμ—μ„œλ„ ν™œλ°œν•˜ 게 μ—°κ΅¬λ˜λŠ” ν˜„μƒμœΌλ‘œ μ°Έμ‘°ν‘œν˜„μΈ 쑰응사(anaphora)의 선행사(antecedent)λ₯Ό μ±„νƒν•˜λŠ” λ°© 법에 따라 κ·Έ μ„±λŠ₯이 μ’Œμš°λœλ‹€. μžμ—°μ–Έμ–΄λ¬Έμž₯μœΌλ‘œλΆ€ν„° 멀티동화λ₯Ό 생성을 μœ„ν•œ μ• λ‹ˆλ©”μ΄μ…˜ μ œμ–΄ 슀크립트 λͺ…λ Ήλ“€μ—μ„œμ˜ 참쑰해결은 μ„ ν–‰ μ •λ³΄μ˜ μ μ ˆν•œ μ°Έμ‘°λ₯Ό λ°”νƒ•μœΌλ‘œ μžμ—°μŠ€λŸ¬μš΄ μ• λ‹ˆλ©”μ΄μ…˜ μž₯면을 μƒμ„±ν•˜λŠ”λ° μžˆμ–΄μ„œ ν•„μˆ˜μ μ΄λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ΄λŸ¬ν•œ λ™ν™”μ˜ μžμ—°μ–Έμ–΄ λ¬Έμž₯에 λ‚˜νƒ€λ‚˜λŠ” μ°Έμ‘°ν˜„μƒλ“€μ— λŒ€ν•΄ μ‚΄νŽ΄λ³΄κ³  결합범주문법을 μ΄μš©ν•˜μ—¬ μ°Έμ‘°ν˜„μƒμ„ ν•΄κ²°ν•˜ λŠ” 방법과 κ΅¬ν˜„λ°©λ²•μ— λŒ€ν•΄ λ…Όμ˜ν•œλ‹€.

Analysis and Reconstruction of Temporal Relations in Multimedia Fairy Tales for Digital Cinematography

Semin Jang and Jong C. Park
Proceedings of the 24th National Conference on Korean Language Processing, pp. 309-316, Cheongju, Korea, October, 2002.
Show abstract
λ™ν™”λŠ” μ‚¬κ±΄μ˜ 흐름에 λ”°λΌμ„œ 이야기λ₯Ό μ§„ν–‰μ‹œν‚¨λ‹€. κ·ΈλŸ¬λ‚˜ λ…μžμΈ μ–΄λ¦°μ΄λ“€μ˜ 관심을 지 μ†μ μœΌλ‘œ μœ μ§€ν•˜κΈ° μœ„ν•˜μ—¬ 사건을 μ‹€μ œ μˆœμ„œμ™€ λ‹€λ₯΄κ²Œ λ°°μΉ˜ν•΄λ†“μ•„ 극적 효과λ₯Ό κΎ€ν•˜λŠ” κ²½μš°κ°€ 많이 μžˆλ‹€. 동화λ₯Ό μ• λ‹ˆλ©”μ΄μ…˜μœΌλ‘œ μƒμ„±ν•˜λŠ”λ° μžˆμ–΄μ„œ μ΄λŸ¬ν•œ μ‚¬κ±΄μ˜ λ°°μΉ˜μ— λ‹΄κΈ΄ μž‘κ°€μ˜ μ˜λ„λ₯Ό μ œλŒ€λ‘œ νŒŒμ•…ν•˜λŠ” 것은 μ€‘μš”ν•œ λ¬Έμ œμ΄λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 이처럼 μ‚¬κ±΄μ˜ 흐 름을 νŒŒμ•…ν•˜κ³  이λ₯Ό ν™œμš©ν•˜κΈ° μœ„ν•΄μ„œ 닀루어야 ν•  언어적 μš”μ†Œλ“€μ— λŒ€ν•˜μ—¬ μ‚΄νŽ΄λ³΄κ³ , κ²° 합범주문법을 μ‚¬μš©ν•˜μ—¬ λ™ν™”μ—μ„œ λ‚˜νƒ€λ‚˜λŠ” μ‹œκ°„ 관계λ₯Ό λΆ„μ„ν•œλ‹€. λ˜ν•œ 각 μ‹œκ°„ 관계에 따라 μ• λ‹ˆλ©”μ΄μ…˜ 효과λ₯Ό 높이기 μœ„ν•œ μ˜μƒ 기법을 μ œμ•ˆν•˜κ³  이λ₯Ό μ΄μš©ν•˜μ—¬ μ‹œκ°„ 관계λ₯Ό μž¬ν˜„ν•˜λŠ” μ‹œμŠ€ν…œμ„ μ„€λͺ…ν•œλ‹€.

Automatic Gene Ontology Extension and Terminology Analysis

Jin-Bok Lee and Jong C. Park
Proceedings of the KISS Conference, pp. 229-231, Suwon, Korea, October, 2002.
Show abstract
생물학 λΆ„μ•Όμ˜ λ°©λŒ€ν•œ 지식을 효율적으둜 닀루기 μœ„ν•˜μ—¬ 생물정보학이 μ£Όμš”ν•œ 연ꡬ λΆ„μ•Όκ°€ λ˜μ—ˆλ‹€. 이 쀑 특히 생물학 λ¬Έν—Œμ—μ„œ 정보λ₯Ό μžλ™μœΌλ‘œ μΆ”μΆœν•˜λŠ” 연ꡬ가 ν™œλ°œνžˆ μ§„ν–‰λ˜κ³  μžˆλŠ”λ°, μ΄λŸ¬ν•œ μ •λ³΄μΆ”μΆœ κ²°κ³Όλ₯Ό μ΄μš©ν•˜μ—¬ μœ μ „μž μ˜¨ν†¨λ‘œμ§€μ™€ 같은 μœ μš©ν•œ μ§€μ‹λ² μ΄μŠ€λ₯Ό μžλ™μœΌλ‘œ ν™•μž₯ν•¨μœΌλ‘œμ¨ 폭발적으둜 증가 ν•˜λŠ” 생물학 λΆ„μ•Όμ˜ 연ꡬ 결과듀을 μ§€μ‹λ² μ΄μŠ€μ— 톡합할 수 μžˆλ‹€. μžλ™μœΌλ‘œ ν™•μž₯된 μ˜¨ν†¨λ‘œμ§€λŠ” 신뒰성을 보μž₯ν•˜κΈ° μœ„ν•œ 검증 과정을 거쳐, μ •λ³΄μΆ”μΆœ μ‹œμŠ€ν…œμ˜ μ„±λŠ₯을 ν–₯μƒμ‹œν‚€κΈ° μœ„ν•œ μ§€μ‹λ² μ΄μŠ€λ‘œ μ‚¬μš©λ˜κ²Œ 된 λ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” λ‹¨λ°±μ§ˆ κ°„μ˜ μƒν˜Έμž‘μš©μ—μ„œ λ‚˜νƒ€λ‚˜λŠ” 쑰건을 μΆ”μΆœν•˜λŠ” μ‹œμŠ€ν…œκ³Ό μœ μ „μž μ˜¨ν†¨λ‘œμ§€λ₯Ό 이 μš©ν•˜μ—¬ μΆ”μΆœλœ 생물학 μš©μ–΄λ₯Ό λΆ„μ„ν•˜λŠ” μ‹œμŠ€ν…œμ„ μ œμ•ˆν•˜κ³  μœ μ „μž μ˜¨ν†¨λ‘œμ§€μ˜ μžλ™ ν™•μž₯ 및 검증 μ‹œμŠ€ν…œ 에 λŒ€ν•˜μ—¬ λ…Όμ˜ν•œλ‹€.

Accomplishments and Challenges in Literature Data Mining for Biology

Lynette Hirschman, Jong C. Park, Junichi Tsujii, Limsoon Wong, and Cathy Wu
Bioinformatics, Vol. 18, No. 12, pp. 1553-1561, 2002.
Show abstract
We review recent results in literature data mining for biology and discuss the need and the steps for a challenge evaluation for this field. Literature data mining has progressed from simple recognition of terms to extraction of interaction relationships from complex sentences, and has broadened from recognition of protein interactions to a range of problems such as improving homology search, identifying cellular location, and so on. To encourage participation and accelerate progress in this expanding field, we propose creating challenge evaluations, and we describe two specific applications in this context.

Natural Language Query Interpretation System for Biomedical Database Access

Hodong Lee and Jong C. Park
Proceedings of the KISS Spring Conference, pp. 487-489, Han Yang University, April 26-27, 2002.
Show abstract
λ³Έ 논문은 이질적인 λ°μ΄ν„°λ² μ΄μŠ€μ— μ„ μž¬λ˜μ–΄ μžˆλŠ” μƒλ¬Όμ˜λ£Œ μ •λ³΄μ˜ κ°œλ…μ μΈ 접근을 κ°€λŠ₯ν•˜κ²Œ ν•˜κΈ° μœ„ν•œ μžμ—°μ–Έμ–΄μ§ˆμ˜ μ‹œμŠ€ν…œμ„ μ„€λͺ…ν•œλ‹€. 이λ₯Ό μœ„ν•΄ λ³Έ μ‹œμŠ€ν…œμ—μ„œλŠ” μ§ˆμ˜λ¬Έμ„ SQL, OQL, CPL λ°μ΄ν„°λ² μ΄μŠ€ μ •ν˜•μ–Έμ–΄λ‘œ λ³€ν™˜ν•˜λŠ”λ°, 이 κ³Όμ •μ—μ„œ ν•„μš”ν•œ 질의문의 뢄석 및 λ³€ν™˜κ³Όμ •μ„ 보인닀. μ œμ•ˆν•˜λŠ” 방법은 ꡬ문뢄석에 μ˜ν•΄ λ„μΆœλœ 정보λ₯Ό μ΄μš©ν•΄ 직접 λ‹€μ–‘ν•œ μ •ν˜•μ–Έμ–΄λ“€λ‘œ λ³€ν™˜ν•˜λ―€λ‘œ, μ‹œμŠ€ν…œμ˜ ꡬ쑰가 간결해지고 λͺ¨λ“ˆν™”λ˜μ–΄ 전체 μ„±λŠ₯κ³Ό μ΄μ‹μ„±μ˜ ν–₯상을 κ°€μ Έμ˜¬ 수 μžˆλ‹€.

Combinatory Categorial Grammar: from Natural Language Understanding to Biomedical Applications

Jong C. Park
Workshop on Natural Language Processing and Ontology Building for Biology, Tokyo, Japan, February, 2002.

Recent Issues in Biopathway Extraction from Literature

Jong C. Park
Institute for Mathematical Sciences (IMS), National University of Singapore, Singapore, February, 2002.

Challenges in Biopathway Extraction from Literature and Ontology Building for Biology

Jong C. Park
Korea Society for Bioinformatics Workshop, February, 2002.

Semi-Automatic Extension of Gene Ontology

Jin-Bok Lee, Jung-jae Kim, and Jong C. Park
Human Computer Interaction (HCI) Workshop, Phoenix Park, Korea, January, 2002.

Interpretation of Natural Language Queries for Relational Database Access with Combinatory Categorial Grammar

Hodong Lee and Jong C. Park
International Journal of Computer Processing of Oriental Languages (IJCPOL), Vol. 15, No. 3, pp. 281-304, 2002.
Show abstract
In this paper, we describe a proposal to derive formal language queries from natural language queries with a combinatory categorial grammar (CCG). CCGs are well known to provide a means of deriving all the levels of information for natural language, i.e., syntax, semantics and discourse, at the same time. In our proposal, we utilize an extra level of representation for formal language queries for the aforementioned derivation. The syntactic coverage is shown with various natural language queries, including compound nouns, modification markers, various types of ellipses, numerical expressions, and subordinate and coordinate constructions. The general purpose CCG lexicon is semi-automatically augmented with the database fields and entries. We also discuss the performance of an implemented natural language query processing system.

BiopathwayBuilder: Nested 3D visualization system for complex molecular interactions

Changsu Lee, Jinah Park, and Jong C. Park
Proceedings of International Conference on Genome Informatics (GIW), pp. 447-448, Tokyo, Japan, 2002.
Show abstract
In order to gain a full understanding of a biological process, we must be able to augment the known molecular interactions with discovered knowledge. We believe that a visualization system works as a means for accomplishing this task, as it provides an intuitive base for necessary information, among others. However, reported implementations have further problems: (1) The size of the information is not only enormous, but also grows very fast, which makes scalability and elision essential properties [5]; (2) the available information is not only incomplete, but also unreliable; and (3) the usual information in the field, such as protein modification [2], is inherently complex, which makes it very difficult to make the resulting visualization intuitive enough for end users as well as field experts. We address all the problems above with a 3D visualization system.

3D Visualization System for Complex Protein-Protein Interactions from Text Data Mining

Changsu Lee, Jinah Park, and Jong C. Park
IEEE Workshop on Visualization in Bioinformatics and Cheminformatics, Boston, USA, 2002.

Natural Language Interpretations for Heterogeneous Database Access

Hodong Lee and Jong C. Park
Proceedings of the International Conference on Computational Linguistics (COLING), pp. 523-529, Taiwan, 2002.

Text Data Mining for Automatic Gene Ontology Extension

Jin-Bok Lee and Jong C. Park
Intelligent Systems for Molecular Biology (ISMB), Proceedings of the second meeting of the special interest group on Text Data Mining, pp. 22-25, Edmonton, Alberta, Canada, 2002.

Literature Data Mining for Biology

Lynette Hirschman, Jong C. Park, Junichi Tsujii, Cathy Wu, and Limsoon Wong
Proceedings of the Pacific Symposium on Biocomputing (PSB) session, pp. 323-325, Hawaii, USA, 2002.

Natural Language Processing for Biomedical Information Extraction and Automatic Ontology Management

Jong C. Park
Proceedings of the 2nd Bioinformatics Forum, pp. 145-158, Seoul, Korea, 2002.

Diphone-based Intonation Generation for Korean with Combinatory Categorial Grammar

Lee Hwa Jin
MS thesis, KAIST, 2002.

Automatic Synthesis of Multimedia Tales with Combinatory Categorial Grammar

Hyun Sook Kim
MS thesis, KAIST, 2002.

Computational Processing of Honorifics in Korean with combinatory Categorial Grammar

O Shik Kwon
MS thesis, KAIST, 2002.

Using Combinatory Categorial Grammar to Extract Biomedical Information

Jong C. Park
IEEE Intelligent Systems, Special Issue on Intelligent Systems in Biology, Vol. 16, No. 6, pp. 62-67, November-December, 2001.
Show abstract
Extracting information from biology databases manually can be an overwhelming task. GenBank, the US National Institutes of Health database containing all publicly available DNA sequences, has more than 14 billion bases in 13 million genetic-sequence records.1 Medline, a literature database available through PubMed, has over 11 million journal citations. In a May 2001 search request for β€œcytokine” (regulatory proteins in the immune system), PubMed returned 296,556 articles.2 Given the quantity and complexity of biomedical literature, demands for computational tools to extract specific information are increasing. In this article, I review biomedical information extraction methods and present research done by KAIST’s natural language processing group on a system that shows encouraging performance using combinatory categorial grammar (explained in detail below) as a natural language grammar formalism.

Biomedical Informatics and Natural Language Processing

Jong C. Park
Annual Meeting of the Korean Society for Medical Informatics, Jeon-ju, Korea, November, 2001.

Bioinformatics and Natural Language Processing

Jong C. Park
Special Issue in Korean Information Processing, Communications of the Korea Information Science Society (KISS), Vol. 19, No. 10, pp. 46-51, October, 2001.
Show abstract
생물정보학(Bioinformatics)은 μƒλ¬Όν•™μ—μ„œ 닀루 λŠ” μ •λ³΄μ˜ 양이 급증함에 따라 μ „μ‚°ν•™, μˆ˜ν•™, 톡계 ν•™ λ“±μ˜ λΆ„μ•Όμ—μ„œ μ‚¬μš©λ˜κ³  μžˆλŠ” μ •λ³΄μ²˜λ¦¬κΈ°λ²•μ„ μ‘μš©ν•˜μ—¬ 이λ₯Ό 효율적으둜 생산, 관리, ν™œμš©ν•˜λ €λŠ” 연ꡬ뢄야λ₯Ό μ΄μΉ­ν•œλ‹€. 1) 그리고 μžμ—°μ–Έμ–΄μ²˜λ¦¬(Natural Language Processing-NLP)λŠ” ν•œκ΅­μ–΄λ‚˜ μ˜μ–΄μ™€ 같은 μžμ—°μ–Έμ–΄λ‘œ ν‘œν˜„λœ λ¬Έμž₯μ΄λ‚˜ λ¬Έμ„œλ“€μ„ 컴퓨터λ₯Ό μ΄μš©ν•˜μ—¬ μ²˜λ¦¬ν•˜κΈ° μœ„ν•œ 연ꡬ뢄야λ₯Ό 총칭 ν•˜λŠ”λ° μ΄μ—λŠ” 인간과 μ»΄ν“¨ν„°μ˜ μƒν˜Έμž‘μš©(Human Computer Interaction-HCI)을 돕기 μœ„ν•œ μ—°κ΅¬μ˜ 츑면도 있고 λ°©λŒ€ν•œ μžμ—°μ–Έμ–΄ 정보λ₯Ό 효율적으둜 관리, ν™œμš©ν•˜κΈ° μœ„ν•œ μ—°κ΅¬μ˜ 츑면도 μžˆλ‹€. λ³Έκ³ μ—μ„œ λŠ” μƒλ¬Όμ˜λ£Œ 정보 μΆ”μΆœ(Biomedical Information Extraction)μ΄λΌλŠ” λΆ„μ•Όμ˜ 연ꡬ에 λŒ€ν•œ μ†Œκ°œλ₯Ό 톡 ν•΄μ„œ 이 두 가지 μƒμ΄ν•œ 연ꡬ뢄야가 μ–΄λ– ν•œ 관계 λ₯Ό κ°€μ§€κ²Œ λ˜λŠ”μ§€μ— λŒ€ν•œ λ…Όμ˜λ₯Ό μ œκ³΅ν•œλ‹€. 졜근 생물정보학에 λŒ€ν•˜μ—¬ 높아진 일반의 관심을 λ°˜μ˜ν•˜ μ—¬ μ •λ³΄κ³Όν•™νšŒμ§€ 2000λ…„ 8μ›”ν˜Έ [1]μ—μ„œλŠ” 생물정보 학을 주제둜 ν•œ νŠΉμ§‘μ„ μ œκ³΅ν•˜μ˜€λŠ”λ° 본고의 λ‚΄μš© 은 여기에 μžμ—°μ–Έμ–΄μ²˜λ¦¬ μ‘μš© λΆ„μ•Όλ₯Ό κ·Έλ‘œλΆ€ν„° 일 λ…„ν›„μ˜ μ‹œμ μ—μ„œ λ³΄μ™„ν•˜λŠ” κ²ƒμœΌλ‘œ λ³Ό 수 μžˆμ„ κ²ƒμœΌ 둜 κΈ°λŒ€λœλ‹€

Automatic Augmentation of Translation Dictionary with Database Terminologies in Multilingual Query Interpretation

Hodong Lee and Jong C. Park
Annual Meeting of the Association for Computational Linguistics (ACL), Workshop on Human Language Technologies and Knowledge Management, pp. 113-120, Toulouse, France, July, 2001.
Show abstract
In interpreting multilingual queries to databases whose domain information is described in a particular language, we must address the problem of word sense disambiguation. Since full-fledged semantic classification information is difficult to construct either automatically or manually for this purpose, we propose to disambiguate the senses of the source lexical items by automatically augmenting a simple translation dictionary with database terminologies and describe an implemented multilingual query interpretation system in a combinatory categorial grammar framework.

Translating Natural Language Queries into Formal Language Queries with Combinatory Categorial Grammar

Hodong Lee and Jong C. Park
Proceedings of the International Conference on Computer Processing of Oriental Languages (ICCPOL), pp. 41-46, Seoul, Korea, May, 2001.

Computational Generation of Context-based Intonation for Korean with Combinatory Categorial Grammar

Lee Hwa Jin and Jong C. Park
Proceedings of International Conference on Computer Processing of Oriental Languages (ICCPOL), pp. 415-420, Seoul, Korea, May, 2001.

Design and Implementation of E-Mail Response Management System for Call Center

Jung-jae Kim, O Shik Kwon, Hodong Lee, and Jong C. Park
Proceedings of the KISS Spring Conference, pp. 445-447, April, 2001.
Show abstract
λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ½œμ„Όν„°λ₯Ό μœ„ν•˜μ—¬ 섀계 및 κ΅¬ν˜„λœ μ „μžλ©”μΌ μžλ™μ‘λ‹΅ 및 관리 μ‹œμŠ€ν…œ μ€‘μ—μ„œ μ„œλ²„ μ‹œμŠ€ν…œμ— ν•΄λ‹Ήν•˜λŠ” 뢀뢄을 κΈ°μˆ ν•˜μ˜€λ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” 도메인에 νŠΉμ„±ν™”λœ ν‘œν˜„ ν˜•μ‹ κ°œλ°œμ„ κ°œλ°œν•˜μ—¬ 보닀 효율적인 3단계 맀칭방법을 가진 μžλ™μ‘λ‹΅κΈ°, ν•™μŠ΅μ— κΈ°λ°˜ν•œ 도메인 λΉ„μ˜μ‘΄μ μΈ μžλ™λΆ„λ₯˜κΈ° 및 적용λ°₯λ²™μ˜ μž¬λ°°μ—΄μ΄ κ°€λŠ₯ν•œ λ‹΄λ‹Ήμž λΆ„λ°°κΈ°λ₯Ό κ΅¬ν˜„ν•˜μ˜€λ‹€.

Bidirectional Incremental Parsing for Automatic Pathway Identification with Combinatory Categorial Grammar

Jong C. Park, Hyun Sook Kim, and Jung-jae Kim
Pacific Symposium on Biocomputing (PSB), pp. 396-407, Big Island, Hawaii, USA, January, 2001.
Show abstract
As the importance of automatically extracting and analyzing various natural language assertions about protein-protein interactions in biomedical publications is recognized, many uses of natural language processing techniques are proposed in the literature. However, most proposals to date make rather simplifying assumptions about the syntactic aspects of natural language due to various reasons including efficiency. In this paper, we describe an implemented system that utilizes combinatory categorical grammar known to be competent in modeling natural language, with a controlled mechanism for the parser to operate bidirectionally and incrementally. We discuss the performance of the system on a large set of abstracts in Medline with quite encouraging results.

Real Time Synthesis of Multimedia Tales in Korean with Combinatory Categorial Grammar

Hyun Sook Kim and Jong C. Park
Proceedings of the National Conference on Korean Information Processing, pp. 509-512, 2001.

Computational Processing of Honorifics in Korean with Combinatory Categorial Grammar

O Shik Kwon and Jong C. Park
Proceedings of the National Conference on Korean Information Processing, pp. 365-372, 2001.
Show abstract
ν•œκ΅­μ–΄λ‚˜ μΌλ³Έμ–΄λŠ” μ˜μ–΄ λ“± μ„œκ΅¬μ˜ 언어와 λΉ„κ΅ν•˜μ—¬ 맀우 λ°œλ‹¬λœ κ²½μ–΄ 체계λ₯Ό 가지고 μžˆλ‹€. κ·ΈλŸ¬λ‚˜ μ΄λŸ¬ν•œ κ²½μ–΄ μ²΄κ³„λŠ” 이듀 μ–Έμ–΄λ₯Ό λͺ¨κ΅­μ–΄λ‘œ μ‚¬μš©ν•˜μ§€ μ•ŠλŠ” μ‚¬λžŒλ“€μ„ ν¬ν•¨ν•˜μ—¬ λͺ¨κ΅­μ–΄λ‘œ μ‚¬μš©ν•˜λŠ” λ§Žμ€ μ‚¬λžŒλ“€κΉŒμ§€λ„ μ •ν™•ν•˜κ²Œ κ΅¬μ‚¬ν•˜κΈ°λŠ” μ–΄λ €μ›Œ ν•˜λŠ” 것이 ν˜„μ‹€μ΄λ‹€. κ·ΈλŸΌμ—λ„ λΆˆκ΅¬ν•˜κ³  κ²½μ–΄ μ²΄κ³„μ˜ μ •ν™•ν•œ ꡬ사 λŠ₯λ ₯은 μ μ ˆν•œ μ–΄νœ˜ 선택 λŠ₯λ ₯κ³Ό ν•¨κ»˜ μžμ—°μŠ€λŸ¬μš΄ μ˜μ‚¬ μ†Œν†΅μ„ μœ„ν•œ μ€‘μš”ν•œ μ–Έμ–΄ λŠ₯λ ₯으둜 κ°„μ£Όλ˜κ³  μžˆλ‹€. 특히 κΈ°κ³„λ²ˆμ—­κΈ°λ‚˜ 문법검사기λ₯Ό κ΅¬ν˜„ν•˜κ³ μž ν•  λ•Œ μ΄λŸ¬ν•œ κ²½μ–΄ 체계λ₯Ό μ •ν™•ν•˜κ²Œ μ΄ν•΄ν•˜λŠ” μ‹œμŠ€ν…œμ˜ κ΅¬ν˜„μ€ ν•œ 차원 높은 μžμ—°μŠ€λŸ¬μš΄ ν‘œν˜„μ„ μ œκ³΅ν•˜κΈ° μœ„ν•˜μ—¬ ν•„μˆ˜μ μ΄λΌκ³  ν•  수 μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν•œκ΅­μ–΄μ˜ κ²½μ–΄ 체계λ₯Ό μ‘°μ‚¬ν•˜κ³  결합범주문법을 ν†΅ν•˜μ—¬ 이λ₯Ό κ²€μ¦ν•˜λŠ” μ‹œμŠ€ν…œμ„ μ†Œκ°œν•œ λ’€ 사극 λŒ€λ³Έμ„ λŒ€μƒμœΌλ‘œ 이 μ‹œμŠ€ν…œμ˜ μ„±λŠ₯을 ν™•μΈν•œλ‹€.

Generation of Contextually Appropriate Responses in E-Commerce with Combinatory Categorial Grammar

Jin-Bok Lee and Jong C. Park
Proceedings of the Human Computer Interaction (HCI) Symposium, pp. 314-319, Phoenix Park Convention Center, Korea, 2001.
Show abstract
We analyze various constructions in Korean including coordination, relative clauses, and embedded clauses by focusing on the phenomenon of quantifier floating where quantifying expressions may appear in places other than their original prenominal one. Based on these analyses, we process Korean sentences in a combinatory categorial grammar (CCG) framework that makes use of all the levels of syntax, semantics, and discourse. Finally, we describe an implemented query system that generates responses with contextually appropriate ellipsis in the domain of e-commerce.

Computational Processing of Floating Quantifiers in Korean with Combinatory Categorial Grammar

Jin-Bok Lee
MS thesis, KAIST, 2001.

Processing Floating Quantifiers with Combinatory Categorial Grammar

Jin-Bok Lee and Jong C. Park
the KISS Regional Conference, November, 2000.
Show abstract
λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν•œκ΅­μ–΄μ—μ„œ λ‚˜νƒ€λ‚˜λŠ” μ–‘ν™”μ‚¬μœ λ™μ„ 병렬ꡬ문, 관계ꡬ문, 내포ꡬ문과 같이 λ³΅μž‘ν•œ μ–Έμ–΄ν˜„μƒκ³Ό κ΄€λ ¨ν•˜μ—¬ 톡사적, 의미적, 담화적 κ΄€μ μ—μ„œ κ³ λ €ν•˜κ³ , 결합범주문법을 μ΄μš©ν•˜μ—¬ ν•œκ΅­μ–΄ λ¬Έμž₯을 뢄석할 수 μžˆμŒμ„ 보인닀. 그리고 이λ₯Ό λ°”νƒ•μœΌλ‘œ μ „μžμƒκ±°λž˜μ™€ 같은 λΆ„μ•Όμ—μ„œ μžμ—°μŠ€λŸ¬μš΄ λŒ€ν™”λ₯Ό ν•  수 μžˆλŠ” μΈν„°νŽ˜μ΄μŠ€ κ΅¬μΆ•μ˜ κ°€λŠ₯성을 μ œμ‹œν•œλ‹€.

Predicting Contextually Appropriate Intonation from Utterances in Korean with Combinatory Categorial Grammar

Lee Hwa Jin and Jong C. Park
Proceedings of the National Conference on Korean Language Processing, pp. 68-75, October, 2000.
Show abstract
μƒλŒ€λ°©μ—κ²Œ μ˜μ‚¬λ₯Ό 전달할 λ•Œ 보닀 μ •ν™•ν•˜κ²Œ μžμ‹ μ˜ μ˜λ„λ₯Ό ν‘œν˜„ν•˜λ €λ©΄ λŒ€ν™”μ˜ 흐름에 λ§žλŠ” μ μ ˆν•œ 얡양을 μ£Όμ–΄ λ°œν™”ν•΄μ•Ό ν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 결합범주문법을 μ΄μš©ν•˜μ—¬ λ¬Έμž₯을 λΆ„μ„ν•˜κ³  λ¬Έμž₯ λ‚΄ 정보와 λ¬Έμž₯ κ°„ 정보 즉, λ¬Έλ§₯에 따라 κ°•μ„Έ(pitch accent), νœ΄μ§€(pause), κ°•μ‘° λ“±μ˜ 얡양정보λ₯Ό μ–΄λ–»κ²Œ λ‚˜νƒ€λ‚΄μ•Ό ν•˜λŠ”μ§€λ₯Ό λΆ„μ„ν•˜μ—¬ λ¬Έμž₯의 정보ꡬ쑰에 μΆ”κ°€ν•˜λŠ” 방법을 μ œμ‹œν•œλ‹€.

Customizable Natural Language Interfaces to Data Bases

Jong C. Park
Invited presentation, Pacific Symposium on Biocomputing (PSB), Honolulu, Hawaii, USA, January, 2000.

Combinatory Categorial Grammar and Natural Language Interface to Database

Hodong Lee and Jong C. Park
Proceedings of the Human-Computer Interaction (HCI) Triangle Workshop, pp. 900-905, Phoenix Park Convention Center, Korea, January, 2000.
Show abstract
In this paper, we discuss issues related to the construction of a natural language interface to databases, including the characteristics of natural language queries. We propose to implement the system using Combinatory Categorial Grammar (CCG), so that various linguistic phenomena can be handled incrementally and in a modular manner for diverged expressions.

Informed Parsing for Coordination with Combinatory Categorial Grammar

Jong C. Park and Hyung-joon Cho
Proceedings of the International Conference on Computational Linguistics (COLING), pp. 593-599, Saarbrucken, Germany, 2000.
Show abstract
Coordination in natural language hampers efficient parsing, especially due to the multiple and mostly unintended candidate conjuncts/disjuncts in a given sentence that shows structural ambiguity. The problem gets more serious in a combinatory categorial grammar framework, which is well known for its competent treatment of coordination, as the flexibility of syntactic analysis often strikes back as spurious ambiguity. We propose to address these ambiguities with predicate argument structures and semantic co-occurrence similarity information, and present encouraging results.

Combinatory Categorial Grammar for the Syntactic, Semantic, and Discourse Analyses of Cordinate Constructions in Korean

Hyung-joon Cho and Jong C. Park
Journal of the Korea Information Science Society (KISS), Vol. 27, No. 4, pp. 448-462, 2000.
Show abstract
Coordinate constructions in natural language pose a number of difficulties to natural language processing units, due to the increased complexity of syntactic analysis, the syntactic ambiguity of the involved lexical items, and the apparent deletion of predicates in various places. In this paper, we address the syntactic characteristics of the coordinate constructions in Korean from the viewpoint of constructing a competence grammar, and present a version of combinatory categorial grammar for the analysis of coordinate constructions in Korean. We also show how to utilize a unified lexicon in the proposed grammar formalism in deriving the sentential semantics and associated information structures as well, in order to capture the discourse functions of coordinate constructions in Korean. The presented analysis conforms to the common wisdom that coordinate constructions are utilized in language not simply to reduce multiple sentences to a single sentence, but also to convey the information of contrast. Finally, we provide an analysis of sample corpora for the frequency of coordinate constructions in Korean and discuss some problematic cases

Combinatory Categorial Grammar for Natural Language Interface

Hodong Lee and Jong C. Park
Proceedings of the KISS Fall Conference, pp. 173-175, 2000.
Show abstract
λ³Έ μ—°κ΅¬μ—μ„œλŠ” μ „μžμƒκ±°λž˜ λ°μ΄ν„°λ² μ΄μŠ€λ₯Ό λŒ€μƒμœΌλ‘œ 결합범주문법을 μ΄μš©ν•œ μžμ—°μ–Έμ–΄μ§ˆμ˜ μΈν„°νŽ˜μ΄μŠ€λ₯Ό κ΅¬ν˜„ν•œλ‹€. 이λ₯Ό μœ„ν•΄ μ§ˆμ˜λ¬Έμ„ λΆ„μ„ν•˜κ³  ν‘œν˜„ 방법을 λ…Όμ˜ν•œλ‹€. λ˜ν•œ SQL ν˜•μ‹μ–Έμ–΄λ‘œ λ³€ν™˜ν•˜κΈ° μœ„ν•œ μ–΄νœ˜ ν‘œν˜„ 및 μœ λ„ 방법을 보인닀. μ œμ•ˆν•˜λŠ” 방법은 ꡬ문뢄석 κ³Όμ •μ—μ„œ SQL ν˜•μ‹μ˜ μ§ˆμ˜λ¬Έμ„ 직접 μœ λ„ν•˜λŠ” κ²ƒμœΌλ‘œ κΈ°μ‘΄ μ—°κ΅¬μ—μ„œ μ œμ•ˆλλ˜ 쀑간논리언어 λ³€ν™˜λ‹¨κ³„λ₯Ό κ±°μΉ˜μ§€ μ•ŠμœΌλ―€λ‘œ 과정이 κ°„κ²°ν•΄μ Έ μ‹œμŠ€ν…œμ˜ μ„±λŠ₯ν–₯상을 κ°€μ Έμ˜¬ 수 μžˆλ‹€. μ‹œμŠ€ν…œμ€ μ›Ή 기반과 client/server ꡬ쑰둜 κ΅¬ν˜„λœλ‹€.

Coordinate Constructions in Korean and Parsing Issues in Combinatory Categorial Grammar

Hyung-joon Cho
MS thesis, KAIST, 2000.

Combinatory Categorial Grammar and Parsing

Hyung-joon Cho and Jong C. Park
Proceedings of the National Conference on Korean Language Processing, pp. 223-230, Mokpo, Korea, October 1999.
Show abstract
λ³Έ λ…Όλ¬Έμ—μ„œλŠ” κ²°ν•©λ²”μ£Όλ¬Έλ²•μœΌλ‘œ ν•œκ΅­μ–΄λ₯Ό μ²˜λ¦¬ν•  λ•Œ κ΅¬λ¬ΈλΆ„μ„κ³Όμ •μ—μ„œ λ³΅μž‘λ„λ₯Ό λ†’μ΄λŠ” 역할을 ν•˜λŠ” spurious ambiguity와 ꡬ쑰적 λͺ¨ν˜Έμ„±μ΄ μžˆλŠ” λͺ…사ꡬ 접속에 λŒ€ν•΄μ„œ λ…Όν•œλ‹€. 톡사적 μ²˜λ¦¬μ™€ 의미적 μ²˜λ¦¬κ°€ λ™μ‹œμ— μˆ˜ν–‰λ˜λŠ” κ²°ν•©λ²”μ£Όλ¬Έλ²•μ˜ νŠΉμ§•μ„ μ‚¬μš©ν•΄μ„œ spurious ambiguity둜 인해 λ°œμƒν•˜λŠ” λ³΅μž‘λ„λ₯Ό μ€„μ΄λŠ” λ°©μ•ˆμ„ μ œμ‹œν•˜κ³  μ ‘μ†ν•­μ—μ„œ μ ‘μ†μ˜ 쀑심이 λ˜λŠ” λͺ…사듀 κ°„μ˜ κ³΅κΈ°μœ μ‚¬λ„λ₯Ό μ΄μš©ν•΄μ„œ 접속항 μ„ μ •μ—μ„œ λ°œμƒν•˜λŠ” λ³΅μž‘λ„μ™€ μ˜€λΆ„μ„μ„ μ€„μ΄λŠ” λ°©μ•ˆμ„ μ œμ‹œν•œ λ’€ 이의 κ°œμ„ λ°©μ•ˆμ„ λ…Όμ˜ν•œλ‹€.

An Analysis of the Semantic and Discourse Functions of the Korean Special Marker `-to'

June K. Park and Jong C. Park
the National Conference on Korean Language Processing, Mokpo, Korea, October 1999.
Show abstract
λ³Έ 논문은 ν•œκ΅­μ–΄μ˜ νŠΉμˆ˜μ‘°μ‚¬, 특히 '도'의 의미, λ¬Έλ§₯적 κΈ°λŠ₯에 λŒ€ν•˜μ—¬ 닀루고 μžˆλ‹€. '도'λŠ” λ¬Έλ§₯의 μžμ—°μŠ€λŸ¬μš΄ 연결에 μžˆμ–΄μ„œ μ€‘μš”ν•œ 역할을 μˆ˜ν–‰ν•œλ‹€. '도'κ°€ 쓰인 λ¬Έμž₯의 λ°°κ²½μ—λŠ” λ°˜λ“œμ‹œ μΌμ •ν•œ μ „μ œκ°€ μ‘΄μž¬ν•œλ‹€. μ „μ œλŠ” κ·Έ λ¬Έμž₯의 의미 뿐만 μ•„λ‹ˆλΌ κΈ°μ‘΄ λ¬Έλ§₯과도 μ§μ ‘μ μœΌλ‘œ μ—°κ΄€λœλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 'κ°™μŒ', 'μœ μ‚¬ν•¨', 'κ·Ήν•œ', '첨가' 및 λ³‘λ ¬λ¬Έμ—μ„œ μ“°μ΄λŠ” λ‹€μ„― 가지 '도'의 κΈ°λŠ₯에 λŒ€ν•˜μ—¬ μ„€λͺ…ν•˜κ³ , alternatives semanticsλ₯Ό μ΄μš©ν•˜μ—¬ 이λ₯Ό 결합범주문법(CCG)μ—μ„œ κ΅¬ν˜„ν•˜λŠ” 방법을 μ œμ‹œν•œλ‹€.

A CCG for Coordination in Korean

Hyung-joon Cho and Jong C. Park
Proceedings of the KISS Conference, pp. 327-329, Jeonju, Korea, April, 1999.
Show abstract
μžμ—°μ–΄μ²˜λ¦¬μ— μžˆμ–΄μ„œ 병렬문은 λΆ„μ„μ˜ λ³΅μž‘μ„±, λ‹¨μ–΄μ˜ λͺ¨ν˜Έμ„±, 곡백 등에 λ”°λ₯Έ 어렀움을 λ‚΄ν¬ν•˜κ³  μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 기쑴에 μ œμ‹œλ˜μ—ˆλ˜ ν•œκ΅­μ–΄ 처리λ₯Ό μœ„ν•œ λ²”μ£Όλ¬Έλ²•μ˜ ν•œκ³„λ₯Ό λ…Όν•˜κ³  기쑴의 범주문법듀이 ν•΄κ²°ν•˜μ§€ λͺ»ν–ˆλ˜ ν•œκ΅­μ–΄ 병렬문을 결합범주문법을 μ‚¬μš©ν•΄μ„œ ν•΄κ²°ν•œλ‹€. ν•œκ΅­μ–΄ 병렬문을 μ²˜λ¦¬ν•˜λŠ” κ³Όμ •μ—μ„œ λΉ„ν˜•μƒμ–Έμ–΄μΈ ν•œκ΅­μ–΄ 병렬문을 μ„œμˆ λ…Όν•­ ꡬ쑰둜 ν‘œν˜„ν•˜κ³  이λ₯Ό κΈ°κ³„λ²ˆμ—­μ‹œμŠ€ν…œμ— ν™œμš©ν•  수 μžˆμŒμ„ 보인닀.

Multiset-CCG for Quantifier Floating in Korean

Jin-Bok Lee and Jong C. Park
Proceedings of the KISS Conference, pp. 330-332, Jeonju, Korea, April, 1999.
Show abstract
λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν•œκ΅­μ–΄μ—μ„œ 양화사가 λ‚˜μ˜€λŠ” μœ ν˜•μ„ μ‚΄νŽ΄λ³΄κ³ , κ·Έ μ€‘μ—μ„œ QFν˜„μƒμ— λŒ€ν•˜μ—¬ λ…Όμ˜ν•œλ‹€. QFν˜„μƒμ΄ 주격, λͺ©μ κ²©, μ—¬κ²©μ—μ„œ λͺ¨λ‘ κ°€λŠ₯ν•˜λ‹€λŠ” 것을 μ œμ‹œν•˜κ³ , λ‚΄ν¬λ¬Έμ—μ„œμ˜ QFκ°€ κ°–λŠ” μ œμ•½μ‘°κ±΄μ„ μ„€λͺ…ν•œλ‹€. μ΄λŸ¬ν•œ 것듀을 ν•œκ΅­μ–΄ μ€‘μ§‘ν•©κ²°ν•©λ²”μ£Όλ¬Έλ²•μ˜ frameworkμ—μ„œ μ„€λͺ…ν•  수 μžˆμŒμ„ 보인닀.

Lexical Selection with a Target Language Monolingual Corpus and an MRD

Hyun Ah Lee, Jong C. Park, and Gil Chang Kim
Proceedings of the Theoretical and Methodological Issues in Machine Translation (TMI), pp. 150-160, Chester, England, 1999.
Show abstract
In this paper, we propose a lexical selection method with three steps: sense disambiguation of source words, sense-to-word mapping, and selection of the most appropriate target language lexical item. The knowledge for each step is extracted from a machine readable dictionary and a target language monolingual corpus. By splitting the process of lexical selection into three steps and extracting the essential knowledge for each step from existing resources, our system can select appropriate words for translation with high extensibility and robustness.

Checking Grammatical Mistakes for English-as-a-Second-Language (ESL) Students

Jong C. Park, Martha Palmer, and Gay Washburn
Proceedings of the KSEA-NERC, New Brunswick, New Jersey, USA, April, 1997.

An English Grammar Checker as a Writing Aid for Students of English as a Second Language

Jong C. Park, Martha Palmer, and Gay Washburn
Conference on Applied Natural Language Processing (ANLP), Descriptions of System Demonstrations and Videos, Washington, D.C., USA, March, 1997.
Show abstract
We present a prototype grammar checker for English as a Second Language (ESL) students, utilizing Combinatory Categorial Grammar (CCG) written in SICStus Prolog. Instead of attempting to handle all possible grammatical errors, the grammar checker identifies certain specific types of grammatical mistakes that appear more regularly than others in the present domain of application.

Quantifier Scope and Constituency

Jong C. Park
The 33rd Annual Meeting of the Association for Computational Linguistics (ACL), Cambridge, Massachusetts, USA, June, 1995.
Show abstract
Traditional approaches to quantifier scope typically need stipulation to exclude readings that are unavailable to human understanders. This paper shows that quantifier scope phenomena can be precisely characterized by a semantic representation constrained by surhce constituency, if the distinction between referential and quantificational NPs is properly observed. A CCG implementation is described and compared to other approaches.

Semantic Significance of Quantification in Natural Language Processing

Jong C. Park
Proceedings of the KSEA-NERC, pp. 432-436, New Brunswick, New Jersey, USA, March, 1995.

A Unification-based Semantic Interpretation for Coordinate Constructs

Jong C. Park
The 30th Annual Meeting of the Association for Computational Linguistics (ACL), Delaware, USA, June, 1992.
Show abstract
This paper shows that a first-order unification-based semantic interpretation for various coordinate constructs is possible without an explicit use of lambda expressions if we slightly modify the standard Montagovian semantics of coordination. This modification, along with partial execution, completely eliminates the lambda reduction steps during semantic interpretation.