Horizons for Information Societies
Seminar #8

The 8th seminar dealt with linguistic support. Dr 川添 (Kawazoe) () and Dr Doan () presented their research on the annotation of multilingual biomedical texts, then Dr Apel () introduced his research on dictionary reversal for German-Japanese at the .

Date: 14 June 2007 (15:00-17:00)
Location: , Tokyo, Japan
Language: English
Attendees: 19 persons
Organization: Dr ()


Semantic Annotation in BioCaster: its Design and Challenges
by Dr

Abstract: The BioCaster project aims to construct a text mining-based system that provides advanced search and analysis of disease outbreak reports on the Web for public health experts, clinicians and researchers interested in infectious diseases. Its key component is the use of automatic learning methods to identify important entities and events using features derived from examples by human annotators. The nature of the task requires an expansion of "markable" categories of concepts, from those referred by proper nouns (names of person, organization, location) to those referred by common nouns and noun phrases, and also from context-independent concepts to concept-dependent ones (e.g. roles). We will present the design of annotation schema used for this project and discuss the difficult cases.

Speaker: Dr 川添愛 (Kawazoe) is a project researcher at the since 2006. She studied theoretical linguistics and received her doctor's degree in literature from in 2005. Her current research focuses on the design of text annotation schema by applying formal ontological methodology and linguistic judgements.

Website: <http://biocaster.nii.ac.jp/>


The Roles of Named Entities and Their Roles in Classifying Annotated Biomedical Texts
by Dr

Abstract: This talk investigates the roles of named entities (NEs) in annotated biomedical text classification. In the annotation schema of BioCaster, a text mining system for public health protection, important concepts that reflect information about infectious diseases were conceptually analyzed with a formal ontological methodology. Concepts were classified as Types, while others were identified as being Roles. Types are specified as NE classes and Roles are integrated into NEs as attributes. We focus on the Roles of NEs by extracting and using them as different features in the classifiers. We discuss in detail advantages of Roles in annotated biomedical texts. The effect of each Role on accuracy of text classification is also discussed.

Speaker: Dr Doan is a project researcher at the since 2006. He received PhD degree in computer science from in September 2005, then moved to the graduated school of information science at as a post-doctor researcher. His interests are text mining, text categorization, information extraction, and machine learning.

Website: <http://biocaster.nii.ac.jp/>


Dictionary Reversal – Building a German-Japanese Dictionary from Japanese-German Data, Using Vocabulary Analysis, User Cooperation and Approaches from Natural Language Processing
by Dr

Abstract: With over 100,000 headwords and about 250,000 records the electronic WaDoku-Dictionary is the most comprehensive Japanese-German dictionary of its kind. Alone the possibility to search the German entries too doesn't make it a German-Japanese dictionary. In many cases a German query will result in too many found records, and it is difficult for the users to decide which Japanese term is the best in a certain case. Users might get for example obsolete Japanese entries, that are important for the sake of completeness of the Japanese side, but obviously shouldn’t be used when writing a text in modern Japanese. Users with Japanese native language will miss further information about the German entries like pronunciation, conjugation, declination, valency etc.

The dictionary reversal project will have several steps to develop more useful German-Japanese dictionary data. We will for example detect and mark-up entries that cannot or should not be reversed (e.g. definitions or archaic entries). We will mark entry domains, that are easily reversible (e.g. computer science, electronics, plant or animal names with scientific names), and we will mark domains that are problematic to reverse (e.g. Buddhist terms, traditional art, traditional medicine). We will further add existing data on German grammar, pronunciation etc., and revise data along a frequency list for German words.

The short-term aim is to build a useable comprehensive free German-Japanese dictionary in only a few months. This data will be improved through user cooperation as it is already implemented for the Japanese-German data.

Parts of our approach should be usable for the reversal of other dictionaries too, and by using it on free Japanese-English and English-Japanese dictionaries, we should be able to carry out even an evaluation of our approach.

Speaker: Dr Apel studied Japanology, Sociology and Ethnology at Munich, Germany. He wrote his Ph.D. thesis on Japanese futures research and futures studies while staying at . He works on the Japanese-German dictionary WaDoku-Jiten since 1998. From 2004 to 2005 he carried out research at the with a scholarship. He continued research at the as guest researcher, and since 2007 has the post of a project researcher there.

Website: <http://wadoku.de/>