News

[Invited Talk] Embracing Noise in Bioinformatics (Dr. Chuan Hock Koh)

News Announcements Published March 18, 2014

Date: 2014. 3. 24. (Wed) 1:00 pm ~ 2:30 pm
Venue: Oh Sang Soo Room, Bldg. E3-1, CS Dept., KAIST
Host: Prof. Jong C. Park

Speaker: Dr. Chuan Hock Koh, Rakuten, Inc., Japan (http://www.kohchuanhock.com/)

Title: Embracing Noise in Bioinformatics

Abstract:
In the classical view of biology, noise has a negative connotation associated with it. Therefore, one would often attempt to remove 쐍oise from data using various statistical methods before any downstream analysis.

There exist two types of noise in biological data; viz., observation noise and system noise. While observation noise is caused by experimental and/or measurement errors, system noise is inherently an important part of a biological system that allows it to evolve and adapt to the ever-changing environment.

Unfortunately, distinguishing observation noise from cell variation is a daunting task, and meaningful cell variation would be inadvertently removed whenever one attempts to eliminate 쐍oise.

Therefore, the philosophy that is undertaken throughout my thesis is acknowledging that noise is inherent in biological systems, and embracing it. More specifically, by embracing noise, we mean to accept that noise is an inherent and important part of biological systems. Therefore, instead of trying to measure and remove them, we propose alternative ways to reduce and suppress them.

Short bio:
Chuan Hock is a bioinformatician by training, and currently a data scientist by profession in Rakuten Japan, the largest e짯commerce company in Japan. He received his PhD from the National University of Singapore in 2013. During the course of his PhD, he spent two years as a visiting graduate student in University of Tokyo. At the point of his PhD submission, he has eleven peer-짯reviewed journal publications in such as PNAS and Bioinformatics. In addition to biological data, he has also worked with financial, social gaming, and most recently, e짯books and e짯commerce data.

If you don셳 find him in front of a computer screen, you will probably find him doing sports. He has competed in national and international dragonboat races representing both his university and Singapore, scaled mountains, and also completed an Ironman triathlon (which, by the way, is one of the toughest triathlons on earth) that took him 15 hours.



[Invited Talk] Contextualising biomedical text mining: from facts to contradictions

News Announcements Published December 11, 2013

Date: 2013.12.16(Mon.) 15:00
Venue: Lecture Room Blue (B301), KI Bldg.
Host: Jong C. Park

Speaker:
Goran Nenadic, University of Manchester, UK

Title:
Contextualising biomedical text mining: from facts to contradictions

Abstract:
To date, progress in biomedical text mining research has primarily focused on entity recognition (locating mentions of species, genes, diseases, clinical findings, etc.) and the extraction of relationships (e.g. between genes/proteins, between diseases and genes etc.). Most of the extracted information is considered as facts and is not placed in the context that delineated the research reporting such facts. In this talk I will overview our efforts to contextualise the results of biomedical text mining by extracting the associated features that charcterise the extracted facts. These include not only negation and speculation, but also associated species, anatomical locations, diseases, aging, etc. I will present several systems and automatically extracted knowledge bases that span themes from biology, bioinformatics and clinical practice.

Speaker Bio:
Dr Goran Nenadic is a Senior Lecturer in the School of Computer Science, University of Manchester, and is a group leader in the Manchester Interdisciplinary BioCenter (MIB). Previously, he was lecturer in the School of Informatics, a post-doctoral research fellow in the same School (former Department of Computation, UMIST), a research fellow at the NLP group, University of Salford, UK, a teaching assistant at the Faculty of Mathematics, University of Belgrade and a visiting teaching assistant in Computational linguistics at the Faculty of Philology, University of Belgrade. Goran has been working in the area of text mining and natural language processing since 1993. His research interests include terminology extraction, acquisition, classification and clustering (mainly in the domain of life sciences), relationship extraction, as well as interoperable architectures for text mining services and digital corpora encoding frameworks. Currently, Goran Nenadic is a principal investigator on a BBSRC project that aims at extraction of associations among various types of entities from the biological literature (bio-MITA - Mining Term Associations from Literature to Support Knowledge Discovery in Biology).


 

[Invited Talk] OntoGene & SASEBio: biomedical text mining research at UZH

News Announcements Published December 11, 2013

Date: 2013.12.16 (Mon.) 9:30
Venue: Lecture Room Blue (B301), KI Bldg.
Host: Jong C. Park

Speaker:
Fabio Rinaldi, University of Zurich, Switzerland

Title:
OntoGene & SASEBio: biomedical text mining research at UZH

Abstract:
In this talk I will describe text mining activities conducted by the OntoGene research group (www.ontogene.org) at the University of Zurich (UZH). The OntoGene group is supported by the Swiss National Science Foundation (project SASEBIO: Semi-Automated Semantic Enrichment of the Biomedical Literature) and by the Scientific Information Management group at Roche Pharmaceuticals. The SASEBio project focuses in particular on applications of text mining technologies to the process of biomedical database curation. The OntoGene text mining system is based on a scalable entity recognition component with a semi-automated organism-based disambiguation module, an in-house dependency parser, and a flexible relation mining approach. The OntoGene team has participated in several biomedical text mining challenges (BioCreative, BioNLP, CALBC), obtaining competitive results in all of them. Some of these results will be discussed in the talk. The OntoGene Document Inspector (ODIN) is an interactive tool which allows database curators to leverage upon the results of the OntoGene text mining system and use them in their curation tasks. One recent version of the system has been tested in the curation process of the Pharmacogenomics Knowledge Base (PharmGKB), and another version adapted for the Comparative Toxicogenomics Database in the context of a BioCreative challenge.

Speaker Bio:
Fabio Rinaldi is the leader of the OntoGene research group at the University of Zurich and the principal investigator of the SASEBio project. He holds an MSc in Computer Science (University of Udine, Italy) and a PhD in Computational Linguistics (University of Zurich, Switzerland). He is author or co-author of 100+ scientific publications (including 19 journal papers) dealing with topics such as Ontologies, Text Mining, Text Classification, Document and Knowledge Management, Language Resources and Terminology.



The Korea-Europe Workshop on Biomedical Informatics and Natural Language Processing

News Announcements Published December 02, 2013

The Korea-Europe Workshop on Biomedical Informatics and Natural Language Processing will be held on 16th Dec., 2013 at Lecture Room Blue (B301) in KI Building.

Invited talks:

Goran Nenadic (University of Manchester, UK)
- Contextualising biomedical text mining: from facts to contradictions

Fabio Rinaldi (University of Zurich, Switzerland)
- OntoGene & SASEBio: biomedical text mining research at UZH

Hyunju Lee (Gwanju Institute of Science and Technology, Korea)
- DNA copy number aberrations in cancer and evidence-based text mining for cancer



5th International Symposium on Languages in Biology and Medicine (LBM 2013)

News Announcements Published August 30, 2013

The Fifth International Symposium on Languages in Biology and Medicine (LBM 2013) will be held at the University of Tokyo, Japan on December 12th and 13th. LBM is a biennial interdisciplinary forum that brings together researchers in biology, chemistry, medicine, public health and informatics to discuss and exploit cutting edge language technologies.
(Website: http://lbm2013.biopathway.org)



The 11th Korea-Singapore Joint Workshop on Bioinformatics and Natural Language Processing

News Announcements Published February 22, 2013

The 11th Korea-Singapore Joint Workshop on Bioinformatics and Natural Language Processing (http://ksw2013.biopathway.org) will be held in Oh Sang Soo Seminar Room at KAIST on 22nd February, 2013.

 

Chairs
Jong C. Park (KAIST, Korea)
Limsoon Wong (NUS, Singapore)

 

Venue
Date: 22nd February, 2013
Location: Oh Sang Soo Seminar Room, CS Bldg., KAIST, Daejeon, South Korea 

 

Speaker

Title

Jae Sook Cheong

(ETRI, Korea)

Tag Graph: a graph-based tagging system for files

Kwoh Chee Keong

(NTU, Singapore)

Drug-target interaction prediction by learning from local information and neighbors

Jung-jae Kim

(NTU, Singapore)

Biomedical ontology alignment for equivalence and subsumption correspondences

Wing Kin Sung

(NUS, Singapore)

Structural variation identification and its applications in decoding cancer genome

Hyunju Lee

(GIST, Korea)

Integrative approaches for DNA copy number aberrations in cancer



5th International Symposium on Semantic Mining in Biomedicine (SMBM)

News Announcements Published July 27, 2012

The 5th International Symposium on Semantic Mining in Biomedicine (SMBM) will be held at the Institute of Computational Linguistics, University of Zurich, Switzerland on 3rd-4th September, 2012.
(website: http://www.smbm.eu)


 

[Announcement] Maria Wolters' seminar: Designing Reminder Systems for Older People - What is the Context?

News Announcements Published July 03, 2012

Dr. Maria Wolters from the University of Edinburgh will give a talk at KAIST on July 16th as follows:

Title: Designing Reminder Systems for Older People - What is the Context?
Speaker: Dr. Maria Wolters, University of Edinburgh
Date: July 16 (Monday), 2012
Time: 1:00pm ~ 2:00pm
Place: Ahn Young-Kyung Seminar Room, CS Building (#4420, E3-1)
Host: Jong C. Park (park@cs.kaist.ac.kr)

Abstract: In the MultiMemoHome project, we aim to develop guidelines for designing acceptable and effective reminders for supporting older people who live in the community. An important part of the design is to consider the context in which reminders are delivered. Is the person who needs the reminders living alone or with family? How large is their social network? How close are they to other people who could help? What technology do they have in their home that could be used to display reminders? What sensory and cognitive impairments do they have that might preclude reminder use? We will look at those questions using data from the English Longitudinal Survey of Ageing, a large-scale study of thousands of older people in the UK. The talk will conclude with suggestions for relevant cross cultural comparisons.

Biography: Maria Wolters is a senior research fellow at the Centre for Speech Technology Research at the University of Edinburgh. The goal of her research is to improve the accessibility and functionality of voice-based interaction. She is also interested in the quantitative analyses of large quantities of language data and in the acoustic analysis of disordered speech.

She studied at the University of Bonn, where she attained an MSc in Computer Science in 1997 and a PhD in Communication Research and Phonetics in 2001. She went on to join the University of Newcastle and Queen Margaret University as a clinical phonetician, before moving to the University of Edinburgh in late 2004. She is currently a research fellow on the MATCH project.


 

[Announcement] Prof. Bonnie Webber's Global Lecture: Discourse in Language Technology

News Announcements Published June 15, 2012

Prof. Bonnie Webber from the University of Edinburgh will give a Global lecture at KAIST on July 17th and 19th as follows:

Title: Discourse in Language Technology
Speaker: Professor Bonnie Webber
Affiliation: School of Informatics, University of Edinburgh (UK)
Time:
- Lecture 1: July 17 (Tuesday), 2012, 2:00 PM ~ 5:00 PM
- Lecture 2: July 19 (Thursday), 2012, 2:00 PM ~ 5:00 PM
Place: Oh Sang-Su Seminar Room, CS Building (#4443, E3-1)
Host: Professor Jong C. Park (諛뺤쥌泥 援먯닔)

Lecture Description
The discourse properties of text have long been recognized as critical to Language Technology, and over the past 40 years, our understanding of and ability to exploit these properties have grown in many ways. The goal of these two lectures is to recount these developments, the technologies they employ, the applications they support, and the new challenges that each subsequent development has raised. The audience will thus be introduced to viable notions of discourse structure that have emerged over the past two decades, and how they are being used to improve the performance of systems for information extraction, summarization, essay analysis and grading, sentiment detection and opinion mining, and machine translation.

LECTURE 1:
뼚 Description of several complementary bases that organize and structure texts, along with a description of their different formal properties.
뼚 Description of state-of-the-art algorithms for recognizing the different forms of text structure, along with resources used by these algorithms for training and/or testing (or that will soon be available for these purposes).

LECTURE 2:
뼚 Description of applications of these algorithms in automated essay evaluation, summarization, information extraction, opinion mining and sentiment detection.
뼚 Description of current and future applications of discourse structure in machine translation (MT), or facilitated through MT.

The lectures draw from recent articles and monographs, including [Webber et al, 2012], [Webber and Joshi, 2012], and [Stede, 2011]. They do not assume detailed linguistic knowledge on the part of audience members, but they do assume that the audience will have had some exposure to text, even if only as bags of words or N-gram language models.

Biography

Fellow, American Association for Artificial Intelligence (AAAI)
Vice President, Association for Computational Linguistics (ACL), 1979
President, Association for Computational Linguistics (ACL), 1980
Co-chair (with Benjamin Kuipers), National Conference on Artificial Intelligence (AAAI-97), 1997

PhD: Harvard University (1978)

Bibliography
Manfred Stede (2011). Discourse Processing. Morgan and Claypool Publishers.

Bonnie Webber and Aravind Joshi (2012). Discourse Structure and Computation: Past, Present and Future.
Proc. ACL Workshop on Rediscovering 50 Years of Discoveries. Jeju Island, Korea.

Bonnie Webber, Markus Egg and Valia Kordoni (2012). Discourse Structure and Language Technology.
Natural Language Engineering, doi:10.1017/S1351324911000337.

Contact: Hye-Jin Min (hjmin@nlp.kaist.ac.kr , T. 7741)


 

[Announcement] Prof. Anoop Sarkar's talk: Two Methods for Morpheme-based Machine Translation

News Announcements Published June 14, 2012

Prof. Anoop Sarkar from Simon Fraser University will give a talk at KAIST on July 2nd as follows:

Title: Two Methods for Morpheme-based Machine Translation
Speaker: Professor Anoop Sarkar (joint work with Ann Clifton and Young-chan Kim)
Affiliation: School of Computing Science, Simon Fraser University, BC (Canada)
Time: July 2 (Monday), 2012, 2:00 PM ~ 3:00 PM
Place: Oh Sang-Su Seminar Room, CS Building (#4443, E3-1)
Host: Professor Jong C. Park (諛뺤쥌泥 援먯닔)

Abstract:
Statistical machine translation systems learn how to translate by training on large amounts of previously translated text. The machine learning models used typically assume that the unit of translation is pre-defined (defined by an observed word boundary). As a result, these methods tend to perform poorly when translating into languages like Finnish or Korean with very complex morphological systems with a large vocabulary.

This talk is about our ongoing work in the use of unsupervised morpheme segmentation methods for machine translation. We investigate four methods: a generative model for generating word forms during translation (factored translation), translation based on word segments (sub-word translation), probability models for generating word forms in the target language (morphology generation) and sub-word alignment models that infer morphemes that can improve alignment between source and target languages (sub-word alignment).

The first three were evaluated on English to Finnish and the last method on English to Korean translation.

We find that morphology aware translation models yield significantly more fluent translations compared to a state of the art word-based baseline. We perform linguistic analysis of the output to show that morpheme-aware translations are more fluent and we show improvements in automatic evaluation scores like BLEU.

Biography:
Anoop Sarkar is an Associate Professor in Computing Science at Simon Fraser University in British Columbia, Canada where he co-directs the Natural Language Laboratory (http://natlang.cs.sfu.ca). He received his Ph.D. from the Department of Computer and Information Sciences at the University of Pennsylvania under Prof. Aravind Joshi for his work on semi-supervised statistical parsing using tree-adjoining grammars. His favorite machine translation system is kriya, the hierarchical phrase-based system developed in his lab at SFU.

His research is focused on statistical parsing and machine translation (exploiting syntax or morphology, or both). His interests also include semi-supervised learning algorithms and stochastic grammars, in particular tree automata and tree-adjoining grammars.

http://www.cs.sfu.ca/~anoop

Contact: Hye-Jin Min (hjmin@nlp.kaist.ac.kr, T. 7741)


 

[Announcement] The 10th Korea-Singapore Joint Workshop on Bioinformatics and Natural Language Processing

News Announcements Published December 01, 2011

The 10th Korea-Singapore Joint Workshop on Bioinformatics and Natural Language Processing (http://ksw2011.biopathway.org) will be held in Oh Sang Soo Seminar Room at KAIST on 12th December, 2011.

Chairs
Jong C. Park (KAIST, Korea)
Limsoon Wong (NUS, Singapore)

Venue
Date: 12th December, 2011
Location: Oh Sang Soo Seminar Room, CS Bldg., KAIST, Daejeon, South Korea



The Fourth International Symposium on Languages in Biology and Medicine (LBM 2011)

News Announcements Published June 05, 2011

The Fourth LBM symposium (LBM 2011, http://lbm2011.biopathway.org) will be held at the Nanyang Technological University (NTU), Singapore as a pre-conference workshop of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 25), on December 14th and 15th, 2011.



[Announcement] Prof. Junichi Tsujii's Global Lecture: NLP-based Text Mining Techniques and their Applications

News Announcements Published May 13, 2010

Prof. Junichi Tsujii at the University of Tokyo will give a Global Lecture at KAIST as follows.

Title: NLP-based Text Mining Techniques and their Applications
Speaker: Junichi Tsujii
Date: May 31 - June 4, 2010 (1pm-4pm)
Location: Oh Sangsu Seminar Room
Host: Jong C. Park (park@cs.kaist.ac.kr)

[Course Description]
Text Mining has been considered as an essential technology in the future of biological research, which provides means by which scientists can cope with ever increasing amount of published papers in the domain. This course focuses on an emerging technological field, NLP-based Text Mining, which combines technologies such as natural language processing, ontology engineering, machine learning and distributed data bases. In particular, the course discusses how recent research results of deep parsing can be combined with machine learning for event recognition and relation mining in biology.

[Day 1] Challenges of Text Mining for Biology
[Day 2] Deep parsing and linguistic formalism
[Day 3] Empirical Approach to Meaning
[Day 4] Named Entity Recognition and Normalization
[Day 5] Event Recognition and Normalization

[Speaker's Bio]
Junichi Tsujii is a Professor of Computational Linguistics and Natural Language Processing of the University of Tokyo and Professor of Text Mining of the University of Manchester, UK. He has an MSc and a PhD from Department of Electrical Engineering, Kyoto University. He has been a permanent member of International Committee of Computational Linguistics (ICCL) from 1994. He was Vice-President (2005) and President (2006) of ACL (Association for Computational Linguistics), and President (2008) of AFNLP (Asian Federation of Natural Language Processing). He was awarded IBM Science Award in 1988, SEYMF Visiting professorship in 2000, Daiwa-Adrian Prize for the project jointly carried out by Dr.S.Ananiadou (University of Manchester, UK) in 2004, IBM Faculty Award in 2005, Achievement Award (Japan Society of Artificial Intelligence) in 2008, and 榮ョ땝筽믥쳽 (the Japanese Government) in 2010.


 

[Announcement] 1st CALBC Workshop

News Announcements Published March 29, 2010

The first CALBC ("Collaborative annotation of a large-scale biomedical corpus") workshop is to be held in 19/20 April 2010 at EBI (European Bioinformatics Institute, U.K.).

Location: European Bioinformatics Institute, Hinxton, Cambridge, U.K.
Date: 19/20 April 2010

The CALBC project is creating a broadly-scoped and diversely annotated corpus (several 100,000 Medline abstracts on immunology annotated with about a dozen semantic groups) by automatically integrating the annotations from different named entity recognition and concept identification systems. The result of the integration process will be a silver standard corpus (SSC).

The CALBC challenge, announced earlier to deal with biomedical named entity recognition (NER) and identification of entities, is now closed. At the CALBC workshop, participants of will discuss the outcome of the challenge. The CALBC project partner will explain in detail previous work on the corpus and will present the results from the challenge. Participants will present their work to meet the demands of the challenge.

The CALBC workshop offers the unique opportunity to learn more about this unusual approach to generate a large-scale annotated corpus. One highlight is the session about the exploitation of the scientific literature in Semantic Web applications. The workshop participants and members of the pharmaceutical industry will discuss how Semantic Web applications will profit from semantic enrichment of the scientific literature (as provided from the harmonised CALBC corpus).