News
Three students joined our lab!
News Announcements Published February 26, 2024Changgeon Ko, Edyta Ołów, and Junmyeong Lee joined our lab for the master's course starting from this semester (Spring 2024). Welcome!
Professor Park serves as General Chair of IJCNLP-AACL 2023
News Announcements Published November 02, 2023Professor Park serves as General Chair of IJCNLP-AACL 2023, the 17th International Joint Conference on Natural Language Processsing (IJCNLP) and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (AACL), which is being held in Nusa Dua, Bali, Indonesia, during November 1-4, 2023.
SeungYoon joined our lab!
News Announcements Published August 28, 2023SeungYoon Han joined our lab for the master's course starting from this semester (Fall 2023). Welcome!
Outstanding Paper Award at ACL 2023
News Announcements Published July 11, 2023Congratulations to Fitsum Gaim, Wonsuk Yang, Hancheol Park, and Prof. Jong C. Park on receiving the Outstanding Paper Award at the Association for Computational Linguistics (ACL) 2023! Their paper, entitled 'Question-Answering in a Low-resourced Language: Benchmark Dataset and Models for Tigrinya,' introduced the TiQuAD dataset, which is the first-ever Question-Answering Dataset for Tigrinya. They deserve big congratulations for their hard work and achievement, and this is a huge honor for our lab.
Taeho joined our lab!
News Announcements Published February 27, 2023Taeho Hwang joined our lab for the master's course starting from this semester (Spring 2023). Welcome!
Four new students joined our lab.
News Announcements Published August 24, 2022Sukmin and Jisu joined our lab for the PhD's course. KyungGeun and Jeong yeon joined our lab for the master's course starting from this semester (Fall 2022). Welcome!
Dongho Choi joined our lab!
News Announcements Published February 28, 2022Dongho Choi joined our lab for the master's course starting from this semester (Spring 2022). Welcome!
Jeewoon Hong did her research internship from our lab.
News Announcements Published December 31, 2021Jeewoon Hong did her research internship from our lab and performed all her duties conscientiously (August 2021 ~ December 2021). We wish her the best of luck in her future career.
Aujin joined our lab!
News Announcements Published September 01, 2021Aujin Kim joined our lab for the master's course starting from this semester (fall 2021). Welcome!
Young Ju Na did her research internship from our lab.
News Announcements Published August 31, 2021Young Ju Na did her research internship from our lab and performed all her duties conscientiously (February 2021 ~ August 2021). We wish her the best of luck in her future career.
Emil Gasimov did his summer research internship from our lab.
News Announcements Published August 31, 2021Emil Gasimov did his summer research internship from our lab and performed all his duties conscientiously (August 2021). We wish him the best of luck in his future career.
One new student joined our lab
News Announcements Published February 22, 2021Sukmin joined our lab for the master's course starting from this semester. Welcome!
Two new students joined our lab
News Announcements Published August 31, 2020ChaeHun and Jisu joined our lab for the PhD's course and the master's course starting from this semester. Welcome!
Two new students joined our lab
News Announcements Published March 02, 2020Euijun and Soyeong joined our lab for the PhD's course and the master's course starting from this semester. Welcome!
Two new members joined our lab
News Announcements Published May 29, 2019We are delighted to announce that Soo Hyun Ryu and Cholé Paris joined our lab as a researcher and an intern. Welcome!
A new student joined our lab
News Announcements Published March 04, 2019Junseop joined our lab for the master's course starting this semester. Welcome!
A new student joined our lab
News Announcements Published November 06, 2018Ada Carpenter joined our lab for the internship. Welcome!
Two new students joined our lab
News Announcements Published February 26, 2018Seungwon and ChaeHun joined our lab as master's students starting from this semester. Welcome!
[Invited Talk] Embracing Noise in Bioinformatics (Dr. Chuan Hock Koh)
News Announcements Published March 18, 2014Date: 2014. 3. 24. (Wed) 1:00 pm ~ 2:30 pm
Venue: Oh Sang Soo Room, Bldg. E3-1, CS Dept., KAIST
Host: Prof. Jong C. Park
Speaker: Dr. Chuan Hock Koh, Rakuten, Inc., Japan (http://www.kohchuanhock.com/)
Title: Embracing Noise in Bioinformatics
Abstract:
In the classical view of biology, noise has a negative connotation
associated with it. Therefore, one would often attempt to remove
“noise” from data using various statistical methods before any
downstream analysis.
There exist two types of noise in biological data; viz., observation
noise and system noise. While observation noise is caused by
experimental and/or measurement errors, system noise is inherently
an important part of a biological system that allows it to evolve and
adapt to the ever-changing environment.
Unfortunately, distinguishing observation noise from cell variation is
a daunting task, and meaningful cell variation would be inadvertently
removed whenever one attempts to eliminate “noise”.
Therefore, the philosophy that is undertaken throughout my thesis
is acknowledging that noise is inherent in biological systems,
and embracing it. More specifically, by embracing noise, we mean
to accept that noise is an inherent and important part of biological
systems. Therefore, instead of trying to measure and remove them,
we propose alternative ways to reduce and suppress them.
Short bio:
Chuan Hock is a bioinformatician by training, and currently a data
scientist by profession in Rakuten Japan, the largest ecommerce
company in Japan. He received his PhD from the National University
of Singapore in 2013. During the course of his PhD, he spent two years
as a visiting graduate student in University of Tokyo. At the point of his
PhD submission, he has eleven peer-reviewed journal publications in
such as PNAS and Bioinformatics. In addition to biological data, he
has also worked with financial, social gaming, and most recently,
ebooks and ecommerce data.
If you don’t find him in front of a computer screen, you will probably
find him doing sports. He has competed in national and international
dragonboat races representing both his university and Singapore,
scaled mountains, and also completed an Ironman triathlon (which,
by the way, is one of the toughest triathlons on earth) that took him
15 hours.
[Invited Talk] Contextualising biomedical text mining: from facts to contradictions
News Announcements Published December 11, 2013
Date: 2013.12.16(Mon.) 15:00
Venue: Lecture Room Blue (B301), KI Bldg.
Host: Jong C. Park
Speaker:
Goran Nenadic, University of Manchester, UK
Title:
Contextualising biomedical text mining: from facts to contradictions
Abstract:
To date, progress in biomedical text mining research has primarily focused on entity recognition (locating mentions of species, genes, diseases, clinical findings, etc.) and the extraction of relationships (e.g. between genes/proteins, between diseases and genes etc.). Most of the extracted information is considered as facts and is not placed in the context that delineated the research reporting such facts. In this talk I will overview our efforts to contextualise the results of biomedical text mining by extracting the associated features that charcterise the extracted facts. These include not only negation and speculation, but also associated species, anatomical locations, diseases, aging, etc. I will present several systems and automatically extracted knowledge bases that span themes from biology, bioinformatics and clinical practice.
Speaker Bio:
Dr Goran Nenadic is a Senior Lecturer in the School of Computer Science, University of Manchester, and is a group leader in the Manchester Interdisciplinary BioCenter (MIB). Previously, he was lecturer in the School of Informatics, a post-doctoral research fellow in the same School (former Department of Computation, UMIST), a research fellow at the NLP group, University of Salford, UK, a teaching assistant at the Faculty of Mathematics, University of Belgrade and a visiting teaching assistant in Computational linguistics at the Faculty of Philology, University of Belgrade.
Goran has been working in the area of text mining and natural language processing since 1993. His research interests include terminology extraction, acquisition, classification and clustering (mainly in the domain of life sciences), relationship extraction, as well as interoperable architectures for text mining services and digital corpora encoding frameworks.
Currently, Goran Nenadic is a principal investigator on a BBSRC project that aims at extraction of associations among various types of entities from the biological literature (bio-MITA - Mining Term Associations from Literature to Support Knowledge Discovery in Biology).
[Invited Talk] OntoGene & SASEBio: biomedical text mining research at UZH
News Announcements Published December 11, 2013Date: 2013.12.16 (Mon.) 9:30
Venue: Lecture Room Blue (B301), KI Bldg.
Host: Jong C. Park
Speaker:
Fabio Rinaldi, University of Zurich, Switzerland
Title:
OntoGene & SASEBio: biomedical text mining research at UZH
Abstract:
In this talk I will describe text mining activities conducted by the OntoGene research group (www.ontogene.org) at the University of Zurich (UZH). The OntoGene group is supported by the Swiss National Science Foundation (project SASEBIO: Semi-Automated Semantic Enrichment of the Biomedical Literature) and by the Scientific Information Management group at Roche Pharmaceuticals. The SASEBio project focuses in particular on applications of text mining technologies to the process of biomedical database curation.
The OntoGene text mining system is based on a scalable entity recognition component with a semi-automated organism-based disambiguation module, an in-house dependency parser, and a flexible relation mining approach. The OntoGene team has participated in several biomedical text mining challenges (BioCreative, BioNLP, CALBC), obtaining competitive results in all of them. Some of these results will be discussed in the talk.
The OntoGene Document Inspector (ODIN) is an interactive tool which allows database curators to leverage upon the results of the OntoGene text mining system and use them in their curation tasks. One recent version of the system has been tested in the curation process of the Pharmacogenomics Knowledge Base (PharmGKB), and another version adapted for the Comparative Toxicogenomics Database in the context of a BioCreative challenge.
Speaker Bio:
Fabio Rinaldi is the leader of the OntoGene research group at the University of Zurich and the principal investigator of the SASEBio project. He holds an MSc in Computer Science (University of Udine, Italy) and a PhD in Computational Linguistics (University of Zurich, Switzerland). He is author or co-author of 100+ scientific publications (including 19 journal papers) dealing with topics such as Ontologies, Text Mining, Text Classification, Document and Knowledge Management, Language Resources and Terminology.
The Korea-Europe Workshop on Biomedical Informatics and Natural Language Processing
News Announcements Published December 02, 2013The Korea-Europe Workshop on Biomedical Informatics and Natural Language Processing will be held on 16th Dec., 2013 at Lecture Room Blue (B301) in KI Building.
Invited talks:
Goran Nenadic (University of Manchester, UK)
- Contextualising biomedical text mining: from facts to contradictions
Fabio Rinaldi (University of Zurich, Switzerland)
- OntoGene & SASEBio: biomedical text mining research at UZH
Hyunju Lee (Gwanju Institute of Science and Technology, Korea)
- DNA copy number aberrations in cancer and evidence-based text mining for cancer
5th International Symposium on Languages in Biology and Medicine (LBM 2013)
News Announcements Published August 30, 2013The Fifth International Symposium on Languages in Biology and Medicine (LBM 2013) will be held at the University of Tokyo, Japan on December 12th and 13th. LBM is a biennial interdisciplinary forum that brings together researchers in biology, chemistry, medicine, public health and informatics to discuss and exploit cutting edge language technologies.
(Website: http://lbm2013.biopathway.org)
The 11th Korea-Singapore Joint Workshop on Bioinformatics and Natural Language Processing
News Announcements Published February 22, 2013The 11th Korea-Singapore Joint Workshop on Bioinformatics and Natural Language Processing (http://ksw2013.biopathway.org) will be held in Oh Sang Soo Seminar Room at KAIST on 22nd February, 2013.
Chairs
Jong C. Park (KAIST, Korea)
Limsoon Wong (NUS, Singapore)
Venue
Date: 22nd February, 2013
Location: Oh Sang Soo Seminar Room, CS Bldg., KAIST, Daejeon, South Korea
Speaker |
Title |
Jae Sook Cheong (ETRI, Korea) |
Tag Graph: a graph-based tagging system for files |
Kwoh Chee Keong (NTU, Singapore) |
Drug-target interaction prediction by learning from local information and neighbors |
Jung-jae Kim (NTU, Singapore) |
Biomedical ontology alignment for equivalence and subsumption correspondences |
Wing Kin Sung (NUS, Singapore) |
Structural variation identification and its applications in decoding cancer genome |
Hyunju Lee (GIST, Korea) |
Integrative approaches for DNA copy number aberrations in cancer |
5th International Symposium on Semantic Mining in Biomedicine (SMBM)
News Announcements Published July 27, 2012
The 5th International Symposium on Semantic Mining in Biomedicine
(SMBM) will be held at the Institute of Computational Linguistics,
University of Zurich, Switzerland on 3rd-4th September, 2012.
(website: http://www.smbm.eu)
[Announcement] Maria Wolters' seminar: Designing Reminder Systems for Older People - What is the Context?
News Announcements Published July 03, 2012
Dr. Maria Wolters from the University of Edinburgh will give a talk at KAIST on July 16th as follows:
Title: Designing Reminder Systems for Older People - What is the Context?
Speaker: Dr. Maria Wolters, University of Edinburgh
Date: July 16 (Monday), 2012
Time: 1:00pm ~ 2:00pm
Place: Ahn Young-Kyung Seminar Room, CS Building (#4420, E3-1)
Host: Jong C. Park (park@cs.kaist.ac.kr)
Abstract:
In the MultiMemoHome project, we aim to develop guidelines for
designing acceptable and effective reminders for supporting older
people who live in the community. An important part of the design
is to consider the context in which reminders are delivered. Is the
person who needs the reminders living alone or with family? How
large is their social network? How close are they to other people
who could help? What technology do they have in their home that
could be used to display reminders? What sensory and cognitive
impairments do they have that might preclude reminder use? We
will look at those questions using data from the English Longitudinal
Survey of Ageing, a large-scale study of thousands of older people
in the UK. The talk will conclude with suggestions for relevant cross
cultural comparisons.
Biography:
Maria Wolters is a senior research fellow at the Centre for Speech
Technology Research at the University of Edinburgh. The goal of
her research is to improve the accessibility and functionality of
voice-based interaction. She is also interested in the quantitative
analyses of large quantities of language data and in the acoustic
analysis of disordered speech.
She studied at the University of Bonn, where she attained an MSc
in Computer Science in 1997 and a PhD in Communication Research
and Phonetics in 2001. She went on to join the University of Newcastle
and Queen Margaret University as a clinical phonetician, before moving
to the University of Edinburgh in late 2004. She is currently a research
fellow on the MATCH project.
[Announcement] Prof. Bonnie Webber's Global Lecture: Discourse in Language Technology
News Announcements Published June 15, 2012
Prof. Bonnie Webber from the University of Edinburgh will give a Global lecture at KAIST on July 17th and 19th as follows:
Title: Discourse in Language Technology
Speaker: Professor Bonnie Webber
Affiliation: School of Informatics, University of Edinburgh (UK)
Time:
- Lecture 1: July 17 (Tuesday), 2012, 2:00 PM ~ 5:00 PM
- Lecture 2: July 19 (Thursday), 2012, 2:00 PM ~ 5:00 PM
Place: Oh Sang-Su Seminar Room, CS Building (#4443, E3-1)
Host: Professor Jong C. Park (박종철 교수)
Lecture Description
The discourse properties of text have long been recognized as critical to Language Technology, and over the past 40 years, our understanding of and ability to exploit these properties have grown in many ways. The goal of these two lectures is to recount these developments, the technologies they employ, the applications they support, and the new challenges that each subsequent development has raised. The audience will thus be introduced to viable notions of discourse structure that have emerged over the past two decades, and how they are being used to improve the performance of systems for information extraction, summarization, essay analysis and grading, sentiment detection and opinion mining, and machine translation.
LECTURE 1:
■ Description of several complementary bases that organize and structure texts, along with a description of their different formal properties.
■ Description of state-of-the-art algorithms for recognizing the different forms of text structure, along with resources used by these algorithms for training and/or testing (or that will soon be available for these purposes).
LECTURE 2:
■ Description of applications of these algorithms in automated essay evaluation, summarization, information extraction, opinion mining and sentiment detection.
■ Description of current and future applications of discourse structure in machine translation (MT), or facilitated through MT.
The lectures draw from recent articles and monographs, including [Webber et al, 2012], [Webber and Joshi, 2012], and [Stede, 2011]. They do not assume detailed linguistic knowledge on the part of audience members, but they do assume that the audience will have had some exposure to text, even if only as bags of words or N-gram language models.
Biography
Fellow, American Association for Artificial Intelligence (AAAI)
Vice President, Association for Computational Linguistics (ACL), 1979
President, Association for Computational Linguistics (ACL), 1980
Co-chair (with Benjamin Kuipers), National Conference on Artificial Intelligence (AAAI-97), 1997
PhD: Harvard University (1978)
Bibliography
Manfred Stede (2011). Discourse Processing. Morgan and Claypool Publishers.
Bonnie Webber and Aravind Joshi (2012). Discourse Structure and Computation: Past, Present and Future.
Proc. ACL Workshop on Rediscovering 50 Years of Discoveries. Jeju Island, Korea.
Bonnie Webber, Markus Egg and Valia Kordoni (2012). Discourse Structure and Language Technology.
Natural Language Engineering, doi:10.1017/S1351324911000337.
Contact: Hye-Jin Min (hjmin@nlp.kaist.ac.kr , T. 7741)
[Announcement] Prof. Anoop Sarkar's talk: Two Methods for Morpheme-based Machine Translation
News Announcements Published June 14, 2012
Prof. Anoop Sarkar from Simon Fraser University will give a talk at KAIST on July 2nd as follows:
Title: Two Methods for Morpheme-based Machine Translation
Speaker: Professor Anoop Sarkar (joint work with Ann Clifton and Young-chan Kim)
Affiliation: School of Computing Science, Simon Fraser University, BC (Canada)
Time: July 2 (Monday), 2012, 2:00 PM ~ 3:00 PM
Place: Oh Sang-Su Seminar Room, CS Building (#4443, E3-1)
Host: Professor Jong C. Park (박종철 교수)
Abstract:
Statistical machine translation systems learn how to translate by
training on large amounts of previously translated text. The machine
learning models used typically assume that the unit of translation
is pre-defined (defined by an observed word boundary). As a result,
these methods tend to perform poorly when translating into languages
like Finnish or Korean with very complex morphological systems with
a large vocabulary.
This talk is about our ongoing work in the use of unsupervised
morpheme segmentation methods for machine translation. We investigate
four methods: a generative model for generating word forms during
translation (factored translation), translation based on word
segments (sub-word translation), probability models for generating
word forms in the target language (morphology generation) and
sub-word alignment models that infer morphemes that can improve
alignment between source and target languages (sub-word alignment).
The first three were evaluated on English to Finnish and the last
method on English to Korean translation.
We find that morphology aware translation models yield significantly
more fluent translations compared to a state of the art word-based
baseline. We perform linguistic analysis of the output to show that
morpheme-aware translations are more fluent and we show improvements
in automatic evaluation scores like BLEU.
Biography:
Anoop Sarkar is an Associate Professor in Computing Science at Simon Fraser
University in British Columbia, Canada where he co-directs the Natural
Language Laboratory (http://natlang.cs.sfu.ca). He received his Ph.D. from
the Department of Computer and Information Sciences at the University of
Pennsylvania under Prof. Aravind Joshi for his work on semi-supervised
statistical parsing using tree-adjoining grammars. His favorite machine
translation system is kriya, the hierarchical phrase-based system developed
in his lab at SFU.
His research is focused on statistical parsing and machine translation
(exploiting syntax or morphology, or both). His interests also include
semi-supervised learning algorithms and stochastic grammars, in particular
tree automata and tree-adjoining grammars.
http://www.cs.sfu.ca/~anoop
Contact: Hye-Jin Min (hjmin@nlp.kaist.ac.kr, T. 7741)
[Announcement] The 10th Korea-Singapore Joint Workshop on Bioinformatics and Natural Language Processing
News Announcements Published December 01, 2011The 10th Korea-Singapore Joint Workshop on Bioinformatics and Natural Language Processing (http://ksw2011.biopathway.org) will be held in Oh Sang Soo Seminar Room at KAIST on 12th December, 2011.
Chairs
Jong C. Park (KAIST, Korea)
Limsoon Wong (NUS, Singapore)
Venue
Date: 12th December, 2011
Location: Oh Sang Soo Seminar Room, CS Bldg., KAIST, Daejeon, South Korea
The Fourth International Symposium on Languages in Biology and Medicine (LBM 2011)
News Announcements Published June 05, 2011The Fourth LBM symposium (LBM 2011, http://lbm2011.biopathway.org) will be held at the Nanyang Technological University (NTU), Singapore as a pre-conference workshop of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 25), on December 14th and 15th, 2011.
[Announcement] Prof. Junichi Tsujii's Global Lecture: NLP-based Text Mining Techniques and their Applications
News Announcements Published May 13, 2010
Prof. Junichi Tsujii at the University of Tokyo will give a Global Lecture at KAIST as follows.
Title: NLP-based Text Mining Techniques and their Applications
Speaker: Junichi Tsujii
Date: May 31 - June 4, 2010 (1pm-4pm)
Location: Oh Sangsu Seminar Room
Host: Jong C. Park (park@cs.kaist.ac.kr)
[Course Description]
Text Mining has been considered as an essential technology in the future of biological research, which provides means by which scientists can cope with ever increasing amount of published papers in the domain. This course focuses on an emerging technological field, NLP-based Text Mining, which combines technologies such as natural language processing, ontology engineering, machine learning and distributed data bases. In particular, the course discusses how recent research results of deep parsing can be combined with machine learning for event recognition and relation mining in biology.
[Day 1] Challenges of Text Mining for Biology
[Day 2] Deep parsing and linguistic formalism
[Day 3] Empirical Approach to Meaning
[Day 4] Named Entity Recognition and Normalization
[Day 5] Event Recognition and Normalization
[Speaker's Bio]
Junichi Tsujii is a Professor of Computational Linguistics and Natural Language Processing of the University of Tokyo and Professor of Text Mining of the University of Manchester, UK. He has an MSc and a PhD from Department of Electrical Engineering, Kyoto University. He has been a permanent member of International Committee of Computational Linguistics (ICCL) from 1994. He was Vice-President (2005) and President (2006) of ACL (Association for Computational Linguistics), and President (2008) of AFNLP (Asian Federation of Natural Language Processing). He was awarded IBM Science Award in 1988, SEYMF Visiting professorship in 2000, Daiwa-Adrian Prize for the project jointly carried out by Dr.S.Ananiadou (University of Manchester, UK) in 2004, IBM Faculty Award in 2005, Achievement Award (Japan Society of Artificial Intelligence) in 2008, and 紫綬褒章 (the Japanese Government) in 2010.
[Announcement] 1st CALBC Workshop
News Announcements Published March 29, 2010
The first CALBC ("Collaborative annotation of a large-scale biomedical corpus") workshop is to be held in 19/20 April 2010 at EBI (European Bioinformatics Institute, U.K.).
Location: European Bioinformatics Institute, Hinxton, Cambridge, U.K.
Date: 19/20 April 2010
The CALBC project is creating a broadly-scoped and diversely annotated corpus (several 100,000 Medline abstracts on immunology annotated with about a dozen semantic groups) by automatically integrating the annotations from different named entity recognition and concept identification systems. The result of the integration process will be a silver standard corpus (SSC).
The CALBC challenge, announced earlier to deal with biomedical named entity recognition (NER) and identification of entities, is now closed. At the CALBC workshop, participants of will discuss the outcome of the challenge. The CALBC project partner will explain in detail previous work on the corpus and will present the results from the challenge. Participants will present their work to meet the demands of the challenge.
The CALBC workshop offers the unique opportunity to learn more about this unusual approach to generate a large-scale annotated corpus. One highlight is the session about the exploitation of the scientific literature in Semantic Web applications. The workshop participants and members of the pharmaceutical industry will discuss how Semantic Web applications will profit from semantic enrichment of the scientific literature (as provided from the harmonised CALBC corpus).