Machine Learning for Protein Engineering banner

In silico prediction, engineering, and design are changing the way drugs will be discovered, designed, and optimised in the future. These tools are still in their early development and much needs to be learned on how to adapt them for use in antibody and vaccine discovery, training, prediction, developability, simulation, and optimisation.

Sunday, 13 November

Recommended Short Course*14:00

SC5:Machine Learning Tools for Protein Engineering  
*Separate registration required. See short courses page for details.

Wednesday, 16 November

Registration and Morning Coffee (Garden Room)07:30




Chairperson's Remarks

Enkelejda Miho, PhD, Professor, Dean, University of Applied Sciences and Arts Northwestern Switzerland


Highly Accurate Protein Structure Prediction with AlphaFold

Simon Kohl, PhD, Senior Research Scientist, DeepMind

Predicting a protein’s structure from its primary sequence has been a grand challenge in biology for the past 50 years. In this talk, we will describe work at DeepMind to develop AlphaFold2, a new deep learning-based system for structure prediction that achieves high accuracy across a wide range of targets. The talk will cover both the underlying machine learning ideas and the implications for biological research.


Antibody Paratope States Improve Structure Prediction to Elucidate Antibody-Antigen Recognition

Monica L. Fernandez-Quintero, PhD, Postdoc Research Scientist, General Inorganic & Theoretical Chemistry, University of Innsbruck

Describing an antibody’s binding site using only one single static structure limits the understanding of the antibody’s function. This limitation is even more pronounced when no experimentally determined structure is available or the crystal structure is distorted by packing effects, which can result in misleading antibody paratope structures. To improve antibody structure prediction and to take the strongly correlated loop and interface movements into account, antibody paratopes should be described as interconverting states in the solution. Therefore, the definition of kinetically and functionally relevant states can be successfully used to improve the accuracy and enhance the understanding of antibody-antigen recognition.



POSTER HIGHLIGHT: Antibody Scanning of Beta-2 Microglobulin

Montader Ali, Graduate Student, Chemistry, University of Cambridge

Determining which protein regions selectively affect function and biophysical properties is crucial in development of diagnostics and therapeutics. The identification of key regions of a protein structure can be achieved through antibody scanning, which involves combining a phenotypic assay with a library of antibodies targeting different epitopes on a protein’s surface. Beta-2 microglobulin (B2M) is a well characterised protein that is associated with a wide range of diseases. Understanding the structural or functional role of B2M's surface epitopes can be clinically beneficial. Using an in silico design strategy we recently developed, nanobodies with high stability and affinity against B2M were generated to perform antibody scanning. 


POSTER HIGHLIGHT: Creation of Monoclonal Antibodies via Phage Display to Target Immunoreactive Part of LPS

Alexandra Fux, Graduate Student, Medical Biology, University of Salzburg

Lipopolysaccharide (LPS, endotoxin) is a cell wall component of gram-negative bacteria and highly toxic upon entering the human body. Reliable endotoxin removal or detection assays are of high demand. Many assays face difficulties in providing animal-free, LPS-specific, and low-priced approaches. This study focuses on the creation of an artificial antibody via Phage Display that specifically binds Lipid A and paves the way towards a bead-based detection and removal tool.


POSTER HIGHLIGHT: Integration of Clinical, Laboratory and Multi-Omics Data to Leverage Machine Learning for Diagnostics

Jan Kruta, Research Associate, School of Life Sciences, University of Applied Sciences & Arts Northwestern Switzerland

Early and accurate disease detection is crucial for preventing disease development and defining therapy strategies. Autoimmune diseases are notoriously challenging to diagnose for clinicians. CDSS are a promising technology to enhance precise diagnostics. However, due to the difficulties of integrating omics data in conjunction with clinical characteristics, such systems are often limited to certain data types. We were able to develop an integration pipeline that reliably diagnosed patients based on multi-omics data combined with clinical and laboratory data. Our results uncover insights in the field of autoimmune diseases and can be adapted for applications across disease conditions.


POSTER HIGHLIGHT: Personalized Medical Platform to Support Diagnosis of Autoimmune Diseases with Artificial Intelligence

Patrick Meier, University of Applied Sciences & Arts Northwestern Switzerland


Doctors struggle to diagnose autoimmune diseases because of overlapping symptoms and a multitude of laboratory test results. Together with physicians and researchers we have developed a Swiss and European prototype of a clinical decision support system, Personalis. The software uses artificial intelligence to detect patterns in the genetic, biomedical, and clinical data. These predictions help physicians make an early and correct diagnosis of autoimmune diseases.


POSTER HIGHLIGHT: Humanness Assessment with Machine Learning for De Novo Nanobody Design

Aubin Ramon, PhD Student, Chemistry, University of Cambridge

Computational methods are emerging techniques to design in silico, new therapeutic single-domain antibodies (nanobodies) targeting a wide range of biological molecules. A humanness tool has been here developed using a deep machine learning vector-quantised variational auto-encoder (VQ-VAE) with unsupervised learning to assess the similarity of de novo-designed nanobodies with antibodies produced by human immune systems. This humanness assessment even shows significant correlation with immunogenicity antibody data.

Coffee Break in the Exhibit Hall with Poster Viewing (Verdi and Vivaldi 1&2)10:00


From Data to Predictions: Virtual Screening for Multi-Specific Protein Therapeutics

Norbert Furtmann, PhD, Head, Computational & High-Throughput Protein Engineering, Large Molecule Research, Sanofi

Our novel, automated high-throughput engineering platform enables the fast generation of large panels of multi-specific variants (up to 10.000) giving rise to large data sets (more than 100.000 data points). By mining our data sets we were able to extract engineering patters and to develop AI-based virtual screening workflows to guide the exploration of huge design spaces for multi-specific biologics drug discovery.



Chairperson's Remarks

M. Frank Erasmus, PhD, Head, Bioinformatics, Specifica, Inc.


Design of Biopharmaceutical Formulations Accelerated by Machine Learning

Paolo Arosio, PhD, Assistant Professor, Chemistry & Applied Biosciences, ETH Zurich

The multiple biophysical properties that overall define the developability of biologics depend not only on protein sequence but also on buffer composition. Here we show how machine learning algorithms can accelerate the design of biopharmaceutical formulations that simultaneously optimize multiple biophysical properties.


AI-Derived Antibody Discovery – Humanoids for Global Good

Joshua Smith, PhD, Molecular Design, Principal Scientist, Just- Evotec Biologics

At Just – Evotec Biologics, we’re developing cutting-edge machine learning technologies to accelerate antibody drug development. In this talk, I’ll focus on the Antibody-GAN – an ML framework that allows us to generate limitless antibody drug candidates with desirable properties. I’ll describe the discovery platform we’ve built upon this technology (J.HAL) and how we’re using data from this platform to develop even more powerful tools for discovery and design.

12:15 An Integrated Discovery Platform: From NovaSeq to an Optimised Antibody

Jannick Bendtsten, CEO, PipeBio

Machine learning and AI are enhancing the drug discovery process and hold the promise of computationally derived antibodies. PipeBio is a leading bioinformatics platform enabling pharmaceutical companies to develop higher quality antibodies by enabling scientists to hitpick from massive amounts of antibody sequence & assay data themselves. We provide a single integrated platform where scientists can perform a range of analyses in one place. Easy labelling, curation and consistent storage of sequence and assay data allows for the deep analysis of datasets as large as NovaSeq as well as ML-assisted engineering of single sequences and everything in between.

12:30 NGS-Guided Selections Enhanced with Early-Stage Biophysical Screening

M. Frank Erasmus, PhD, Head of Bioinformatics, Specifica, Inc.

We show how NGS-guided selection strategies from in-vitro antibody discovery campaigns combined with early-stage screening can be used to improve lead prioritization using our cloud-native bioinformatics platform, AbXtract™. Our approach utilizes a broad sampling mechanism along with clustering to minimize the impact of population biases (e.g., clonal dominance) to obtain a comprehensive understanding of the underlying antibody population, which we subject to rapid biophysical screening for critical feedback metrics. 

12:45Enjoy Lunch on Your Own

Dessert Break in the Exhibit Hall & Last Chance for Poster Viewing (Verdi and Vivaldi 1&2)13:50

Breakout Discussions14:45

Breakout Discussions are informal, moderated, small-group discussions, allowing participants to exchange ideas and experiences and develop future collaborations around a focused topic. Each discussion will be led by a facilitator who keeps the discussion on track and the group engaged. For in-person events, the facilitator will lead while sitting with delegates around a table. For virtual attendees, the format will be in an online networking platform. To get the most out of this format, please come prepared to share examples from your work, be a part of a collective, problem-solving session, and participate in active idea sharing. 


Best Practices for Using Machine Learning in NGS-Guided Antibody Discovery IN PERSON ONLY

M. Frank Erasmus, PhD, Head, Bioinformatics, Specifica, Inc.

  • What questions do you aim to address within a given NGS-guided discovery campaign?
  • How does unsupervised or supervised machine learning aid in this NGS-guided discovery effort?
  • Is deep learning required for your particular application or do shallow learning approaches/simple heuristics suffice?
  • How do you collect/prepare your data to established an accurate ground truth?
  • How do you validate your ML model? 
  • What data encoding/reduction methods do you employ (e.g. one-hot encoding, physicochemical tokenization) to represent your sequence data? 
  • Does 3D coordinate information enhance your sequence-based dataset? Which tools do you use when structure is unavailable?​



Chairperson's Remarks

M. Frank Erasmus, PhD, Head, Bioinformatics, Specifica, Inc.


Applications of Machine Learning and Informatics in Antibody Discovery

Charlotte M. Deane, PhD, Professor of Structural Bioinformatics, Statistics, University of Oxford

Machine learning has shown its power across all of biology and in this talk, I will describe some of the novel machine learning tools we are pioneering in the area of biotherapeutics from computational humanisation to accurate rapid structure prediction and virtual high throughput screening.


Automated Optimisation of Antibody Developability Potential

Pietro Sormanni, PhD, Group Leader, Royal Society University Research Fellow, Chemistry of Health, Yusuf Hamied Department of Chemistry, University of Cambridge

The development of biologics with suitable functionality into practically useful molecules is often impeded by developability issues. Conformational stability and solubility are arguably the most important biophysical properties underpinning developability potential, as they determine colloidal stability and aggregation, and correlate with yield and poly-reactivity. I will present a computational pipeline, and corresponding experimental validation, for the automated design of antibody variants with improved stability and solubility.



Chairperson's Remarks

Victor Greiff, PhD, Associate Professor, Immunology, University of Oslo


Deciphering the Language of Antibodies Using Self-Supervised Learning

Jinwoo Leem, PhD, Associate Director, Data Science, Alchemab Therapeutics

An individual’s B cell receptor (BCR) repertoire encodes information about past immune responses and potential for disease protection. Deciphering the information in BCR sequence datasets will transform our understanding of disease and enable discovery of novel antibody therapeutics. Here, we present an antibody-specific language model, AntiBERTa; it learns a rich, biologically relevant representation of BCR sequences, and the model is generalizable to a number of applications, such as paratope prediction.


Discovery in the Age of AlphaFold


Enkelejda Miho, PhD, Professor, Dean, University of Applied Sciences and Arts Northwestern Switzerland


Charlotte M. Deane, PhD, Professor of Structural Bioinformatics, Statistics, University of Oxford

Monica L. Fernandez-Quintero, PhD, Postdoc Research Scientist, General Inorganic & Theoretical Chemistry, University of Innsbruck

Norbert Furtmann, PhD, Head, Computational & High-Throughput Protein Engineering, Large Molecule Research, Sanofi

Juan Carlos Mobarec, PhD, Head Computational Structural Biology - Associate Director, Mechanistic and Structural Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK

Close of Summit17:30