Cambridge Healthtech Institute’s 4th Annual

Machine Learning for Protein Engineering Part 1

Advancing Protein Engineering with AI: Next Generation Models, Data Strategies, and Applications

12 November 2025 ALL TIMES WET (GMT/UTC)


The 2025 Machine Learning for Protein Engineering Part 1 track will explore the expanding role of AI and ML in revolutionizing protein design and optimization. This track will cover key advancements in algorithm development, model evaluation, and data-driven decision-making, providing researchers with tools to enhance predictive accuracy and experimental efficiency. Attendees will gain insights into the challenges of working in low-data environments, the integration of active learning strategies, and the evolution of multimodal models. Discussions will also extend to cutting-edge applications, from clinical biology to modeling undruggable targets, highlighting how ML is shaping the future of biologic discovery and therapeutic development.

Recommended Short Course*
Monday, 4 November, 14:00 – 17:00
SC4: In silico and Machine Learning Tools for Antibody Design and Developability Predictions
*Separate registration required. See short courses page for details. All short courses take place in-person only.





Wednesday, 12 November

Registration and Morning Coffee

ACTIVE LEARNING AND TRAINING DATA GENERATION

Chairperson's Remarks

Karin Hrovatin, Bioinformatic Scientist, Merck KGaA , Bioinformatic Scientist , Merck KGaA

Lab-in-the-Loop Application for Clinically Relevant Antigen Targets

Photo of Ji Won Park, PhD, Principal ML Scientist, Prescient Design, Genentech , Principal ML Scientist , Prescient Design (AI for Drug Discovery) , Genentech
Ji Won Park, PhD, Principal ML Scientist, Prescient Design, Genentech , Principal ML Scientist , Prescient Design (AI for Drug Discovery) , Genentech

We introduce “Lab-in-the-loop,” a paradigm shift for antibody design that orchestrates generative machine learning models, multi-task property predictors, active learning ranking and selection, and in vitro experimentation in a semiautonomous, iterative optimization loop. We apply lab-in-the-loop to four clinically relevant antigen targets: EGFR, IL-6, HER2, and OSM.

Active Learning for the Prediction of Antibody Pairwise Competition

Photo of Akila I. Katuwawala, PhD, Scientist II, Computational Biology, Adimab LLC , Scientist II , Computational Biology , Adimab LLC
Akila I. Katuwawala, PhD, Scientist II, Computational Biology, Adimab LLC , Scientist II , Computational Biology , Adimab LLC

Epitope competition assays are a routine part of therapeutic antibody discovery. Modern high-throughput technologies have enabled the generation of complete pairwise binning matrices on large antibody panels. However, running the experiment for large panels is time-consuming and costly. The ML model in the workflow is trained on experimentally obtained pairwise binning information for a subset of the interactions, and the trained model predicts the remainder of the interactions. Blind testing across twelve distinct panels of antibodies, including IgGs and HCABs, targeting a variety of antigens, places the accuracy of the approach in the range of 85-90%.

Designing and Analysis of a Large Library-on-Library Dataset to Reveal Insights on Protein Stability across Different VHH:Antigen Complexes

Photo of Jurrian de Kanter, PhD, Data Scientist, Genmab , Data Scientist , Genmab
Jurrian de Kanter, PhD, Data Scientist, Genmab , Data Scientist , Genmab

Accelerating antibody-based medicine development requires better understanding and prediction of antibody-antigen binding. Affinity datasets from mutational scans are valuable but poorly understood. We present a multi-modal dataset of antigen and VHH interface variants, capturing affinity, stability, and expression changes. Most affinity changes stem from stability shifts. Structure-conditioned inverse folding models predict stability well but struggle with interface changes, underscoring the need for high-quality datasets in protein engineering.

Coffee Break in the Exhibit Hall with Poster Viewing

SEQUENCE-CENTRIC MODELS

To What Extent Can Large-Language Models Represent 3D Information?

Photo of Isaac Ellmen, Researcher, Oxford Protein Informatics Group, University of Oxford , Researcher , Oxford Protein Informatics Group , University of Oxford
Isaac Ellmen, Researcher, Oxford Protein Informatics Group, University of Oxford , Researcher , Oxford Protein Informatics Group , University of Oxford

In machine learning, protein sequences are usually processed by Transformers/LLMs, while protein structures are typically represented as graphs and processed by GNNs. However, AlphaFold3 and recent studies on protein LLMs show that Transformers alone can make sense of 3D inputs. Here I will share our recent work on deciphering the inner workings of Transformers when given 3D coordinates. These insights can guide the rational design of hybrid sequence/structure protein models.

KEYNOTE PRESENTATION: Evolution of Biologics Engineering: Integrating AI into Biologics Discovery

Photo of Rebecca Croasdale-Wood, PhD, Senior Director, Augmented Biologics Discovery & Design, Biologics Engineering, Oncology, AstraZeneca , Senior Director Augmented Biologics Discovery & Design , Augmented Biologics Discovery & Design , AstraZeneca
Rebecca Croasdale-Wood, PhD, Senior Director, Augmented Biologics Discovery & Design, Biologics Engineering, Oncology, AstraZeneca , Senior Director Augmented Biologics Discovery & Design , Augmented Biologics Discovery & Design , AstraZeneca

Augmenting biologics discovery with AI and novel computational tools holds the promise to transform the field of biologics discovery. In this keynote, we will explore the latest advancements in in silico biologics design and optimisation technologies, highlighting our internal platform capabilities. Additionally, we will review the impact of standalone technologies and the benefits of integrating novel in silico methods with our existing biologics discovery platforms.


Luncheon in the Exhibit Hall with Poster Viewing

EXPANDING AND OPTIMISING THE ML MODEL AND ALGORITHM TOOLKIT

Chairperson's Remarks

Olga Obrezanova, PhD, AI Principal Scientist, Biologics Engineering, Oncology R&D, AstraZeneca , AI Principal Scientist, Biologics Engineering , Oncology R&D , AstraZeneca

Evaluation of Digital Protein Design Tools in an Industry Setting

Photo of Karin Hrovatin, Bioinformatic Scientist, Merck KGaA , Bioinformatic Scientist , Merck KGaA
Karin Hrovatin, Bioinformatic Scientist, Merck KGaA , Bioinformatic Scientist , Merck KGaA
Photo of Stephanie Linker, PhD, Senior Computational Biochemist, Merck Group , Sr. Computational Biochemist , Digital Innovation , Merck Group
Stephanie Linker, PhD, Senior Computational Biochemist, Merck Group , Sr. Computational Biochemist , Digital Innovation , Merck Group

As traditional protein development is resource-intensive, Merck is leveraging digital approaches to design and assess new proteins, increasing the hit rate of laboratory screens. We combine state-of-the-art tools for protein structure prediction (AlphaFold3-type models), representation (ESM), and de novo generation (diffusion models) with classical computational biochemistry and bioinformatics methods. We will present the application of our pipeline and discuss current gaps and potential solutions based on ongoing developments in the field.

Towards Accurate Biomolecular Modelling and Design with Boltz

Photo of Talip Ucar, Founding Member, Boltz , Founding member , Boltz
Talip Ucar, Founding Member, Boltz , Founding member , Boltz
Photo of Jeremy Wohlwend, PhD, CTO, Boltz , CTO , Boltz
Jeremy Wohlwend, PhD, CTO, Boltz , CTO , Boltz

We present Boltz, a family of open models for biomolecular modelling and design. Combining advances in structure prediction, affinity learning, and generative modelling, Boltz enables accurate reconstruction of molecular interactions and the creation of new functional biomolecules across diverse targets. Experimental validations demonstrating high-affinity binders illustrate the potential of large, open biomolecular models to accelerate molecular discovery.

Transition to Keynote Session

PLENARY DEEP DIVE

Panel Moderator:

PANEL DISCUSSION:
Future of Biologic Therapeutics: Will Half-Life Extended Peptides Replace Multispecific Antibodies?

Photo of Daniel Chen, MD, PhD, Founder & CEO, Synthetic Design Lab , Founder and CEO , Synthetic Design Lab
Daniel Chen, MD, PhD, Founder & CEO, Synthetic Design Lab , Founder and CEO , Synthetic Design Lab

Panelists:

Photo of Paul J. Carter, PhD, Genentech Fellow, Antibody Engineering, Genentech , Genentech Fellow , Antibody Engineering , Genentech
Paul J. Carter, PhD, Genentech Fellow, Antibody Engineering, Genentech , Genentech Fellow , Antibody Engineering , Genentech
Photo of G. Jonah Rainey, PhD, Associate Vice President, Eli Lilly and Company , Associate Vice President , Eli Lilly & Co.
G. Jonah Rainey, PhD, Associate Vice President, Eli Lilly and Company , Associate Vice President , Eli Lilly & Co.
Photo of Janine Schuurman, PhD, Biotech Consultant, Lust for Life Science B.V. , Director , Lust for Life Science B.V.
Janine Schuurman, PhD, Biotech Consultant, Lust for Life Science B.V. , Director , Lust for Life Science B.V.

Refreshment Break in the Exhibit Hall with Poster Viewing

NEXT GENERATION APPLICATIONS FOR AI AND MACHINE LEARNING

AI-Guided Discovery and Engineering of a Dual-Specific scFv

Photo of Ryan Emerson, PhD, Vice President, Data Science, A Alpha Bio Inc. , VP , Data Science , A Alpha Bio Inc
Ryan Emerson, PhD, Vice President, Data Science, A Alpha Bio Inc. , VP , Data Science , A Alpha Bio Inc

Dual-specific antibodies are a promising but challenging modality. We demonstrate a combination of experimental and computational techniques to discover and optimise a TIGIT/LILRB4 dual-specific binder. Starting from a synthetic humanoid phage library, we identified compatible scFvs, used multiplexed yeast display affinity data to confirm binding and generate a training dataset, and applied a fine-tuned deep learning affinity oracle for optimisation, yielding a molecule with improved dual-target affinity and developability.

New Specificities and Ultra-High Affinities: Can Sequence-Trained LLMs Predict Labels They Have Never Seen?

Photo of Tzvika Hartman, PhD, Senior Vice President, Computational, Biolojic Design Ltd. , Sr VP Computational , Computational , Biolojic Design Ltd
Tzvika Hartman, PhD, Senior Vice President, Computational, Biolojic Design Ltd. , Sr VP Computational , Computational , Biolojic Design Ltd

Typical ML models are trained by masking and predicting experimentally determined labels. However, in novel drug discovery the goal is often to design antibodies that are better than previously observed ones or even have entirely new characteristics. In this work, we demonstrate that integrating pretrained LLMs with datasets featuring continuous labels allows prediction of binders with novel specificities and with much better affinities than those seen previously in experiments.

OpenFold3: A Frontier Model for Biomolecular Structure Prediction

Photo of Vinay S Swamy, Computational Biologist, Biomedical Informatics, Columbia University , Phd Student , Biomedical Informatics , Columbia University
Vinay S Swamy, Computational Biologist, Biomedical Informatics, Columbia University , Phd Student , Biomedical Informatics , Columbia University

The OpenFold Consortium brings together academic and industrial teams to build state-of-the-art protein structure and co-folding prediction models optimised for use on commercial computational hardware. We develop fully open sourced models and support creation of new experimental datasets, aiming to build more powerful models that can accurately predict complex systems of significance to life sciences. With our latest release, we aim to reproduce the full scale and training regimen of AlphaFold3 and provide open source model weights, extensive datasets, and a permissively licensed code library for developing novel architectures and custom training pipelines.

Close of Machine Learning for Protein Engineering Part 1 Conference


For more details on the conference, please contact:

Kent Simmons
Senior Conference Director
Cambridge Healthtech Institute
Phone: (+1) 207-329-2964
Email: ksimmons@healthtech.com

For sponsorship information, please contact:

Companies A-K
Jason Gerardi
Sr. Manager, Business Development
Cambridge Healthtech Institute
Phone: (+1) 781-972-5452
Email: jgerardi@healthtech.com

Companies L-Z
Ashley Parsons
Manager, Business Development
Cambridge Healthtech Institute
Phone: (+1) 781-972-1340
Email: ashleyparsons@healthtech.com