Data Driven Chemistry Laboratory

Staff & Contact

Educational Staff	Prof. Yukiharu Uraoka Prof. Tomoyuki Miyaoa Assistant Prof. Akinori Sato
URL	https://sites.google.com/view/naist-chemoinformatics

Education and Research Activities in the Laboratory

Chemoinformatics is a research area where chemical problems are tackled using tools coming from informatics. Our mission is to develop chemoinformatics tools that are truly useful and practical for applications in the field of chemistry. For example, molecular representations have been extensively investigated for virtual screening of a large compound data set for identifying bioactive compounds. Likewise, the investigation of appropriate chemical reaction representations for predicting reaction parameters (yield, selectivity) is a current research activity. For developing tools or methods, one must understand both domain knowledge (chemistry or biology) and analysis techniques (statistics, machine learning). Either having experienced one of the two studies is preferable for conducting meaningful research. So far, most of the students in our group have chemistry or biology- backgrounds. They have learned information techniques through a training program provided by our group. Starting from the basics of data analysis (machine learning), you will learn how to handle chemistry-related data and analyze them to obtain useful information. For students who have an information-science background, they can learn knowledge of chemistry and biology focusing on drug discovery to conduct meaningful study.

Research Themes

1.Methodology development for affinity prediction

Virtual screening is a process which selects potential candidate compounds for a specific target from a compound pool. In ligand-based approaches, the principle that similar compounds show similar biological activity holds. This principle, however, is not necessarily true when focusing on ligand-protein binding mechanisms. Methodology development for extracting key information for this phenomenon in ligand-based approaches furthers improvement of virtual screening.

2.Constructing high predictive soft sensor models using limited data sources

Predicting chemical reaction parameters (yield or selectivity) in advance can contribute not only to reducing experimental costs but also to understanding the reaction mechanism. Once we understand the reaction, optimal experimental conditions (including catalysts) can be proposed. Since data for organic chemical reactions have been accumulated, these data should be utilized effectively.

3. Modeling approaches in Low data regime

Laboratory-scale chemistry data sets are small: less than 50 samples (sometimes around 10), which were experimented at a homogenous experimental condition. Mechanism-oriented molecular representation in combination with traditional machine learning modeling would be a reasonable approach for this type of problem, however, recently develop DNN techniques: meta-learning, pre-training would also be options.

Recent Research Papers and Achievements

S. Shibayama, H. Kaneko, K. Funatsu, Comput. Chem. Eng. 113, 86-97, 2018
T. Miyao, K. Funatsu, J. Bajorath, F1000Research, 2017, 6 :1285
T. Miyao, H. Kaneko, K. Funatsu, J. Chem. Inf. Model., 2016, 56, 286-299