2025 ARCHIVES

Cambridge Healthtech Institute’s 4th Annual

Machine Learning for Protein Engineering Part 1

Advancing Protein Engineering with AI: Next Generation Models, Data Strategies, and Applications

12 November 2025 ALL TIMES WET (GMT/UTC)

The 2025 Machine Learning for Protein Engineering Part 1 track will explore the expanding role of AI and ML in revolutionizing protein design and optimization. This track will cover key advancements in algorithm development, model evaluation, and data-driven decision-making, providing researchers with tools to enhance predictive accuracy and experimental efficiency. Attendees will gain insights into the challenges of working in low-data environments, the integration of active learning strategies, and the evolution of multimodal models. Discussions will also extend to cutting-edge applications, from clinical biology to modeling undruggable targets, highlighting how ML is shaping the future of biologic discovery and therapeutic development.

Recommended Short Course*
Monday, 4 November, 14:00 – 17:00
SC4: In silico and Machine Learning Tools for Antibody Design and Developability Predictions
*Separate registration required. See short courses page for details. All short courses take place in-person only.

Wednesday, 12 November

07:30Registration and Morning Coffee

08:25

Chairperson's Remarks

Karin Hrovatin, Bioinformatic Scientist, Merck KGaA

08:30

Lab-in-the-Loop Application for Clinically Relevant Antigen Targets

Ji Won Park, PhD, Principal ML Scientist, Prescient Design, Genentech

We introduce “Lab-in-the-loop,” a paradigm shift for antibody design that orchestrates generative machine learning models, multi-task property predictors, active learning ranking and selection, and in vitro experimentation in a semiautonomous, iterative optimization loop. We apply lab-in-the-loop to four clinically relevant antigen targets: EGFR, IL-6, HER2, and OSM.

09:00

Active Learning for the Prediction of Antibody Pairwise Competition

Akila I. Katuwawala, PhD, Scientist II, Computational Biology, Adimab LLC

Epitope competition assays are a routine part of therapeutic antibody discovery. Modern high-throughput technologies have enabled the generation of complete pairwise binning matrices on large antibody panels. However, running the experiment for large panels is time-consuming and costly. The ML model in the workflow is trained on experimentally obtained pairwise binning information for a subset of the interactions, and the trained model predicts the remainder of the interactions. Blind testing across twelve distinct panels of antibodies, including IgGs and HCABs, targeting a variety of antigens, places the accuracy of the approach in the range of 85-90%.

09:30

Designing and Analysis of a Large Library-on-Library Dataset to Reveal Insights on Protein Stability across Different VHH:Antigen Complexes

Jurrian de Kanter, PhD, Data Scientist, Genmab

Accelerating antibody-based medicine development requires better understanding and prediction of antibody-antigen binding. Affinity datasets from mutational scans are valuable but poorly understood. We present a multi-modal dataset of antigen and VHH interface variants, capturing affinity, stability, and expression changes. Most affinity changes stem from stability shifts. Structure-conditioned inverse folding models predict stability well but struggle with interface changes, underscoring the need for high-quality datasets in protein engineering.

10:00

Maximise AI Potential in Biologics Discovery and Development: From Model Training to Consumption

Nicola Bonzanni, CEO, ENPICOM

We will discuss the key challenges in creating and deploying machine learning for biologics discovery. While creating complex models for discovery and development is becoming commonplace, managing the entire ML model lifecycle is essential for effective use in therapeutic research and maximising AI investment returns. Discover how a unified platform can streamline AI use in biologics discovery, from model training to consumption.

10:30Coffee Break in the Exhibit Hall with Poster Viewing

11:15

To What Extent Can Large-Language Models Represent 3D Information?

Isaac Ellmen, Researcher, Oxford Protein Informatics Group, University of Oxford

In machine learning, protein sequences are usually processed by Transformers/LLMs, while protein structures are typically represented as graphs and processed by GNNs. However, AlphaFold3 and recent studies on protein LLMs show that Transformers alone can make sense of 3D inputs. Here I will share our recent work on deciphering the inner workings of Transformers when given 3D coordinates. These insights can guide the rational design of hybrid sequence/structure protein models.

11:45

KEYNOTE PRESENTATION: Evolution of Biologics Engineering: Integrating AI into Biologics Discovery

Rebecca Croasdale-Wood, PhD, Senior Director, Augmented Biologics Discovery & Design, Biologics Engineering, Oncology, AstraZeneca

Augmenting biologics discovery with AI and novel computational tools holds the promise to transform the field of biologics discovery. In this keynote, we will explore the latest advancements in in silico biologics design and optimisation technologies, highlighting our internal platform capabilities. Additionally, we will review the impact of standalone technologies and the benefits of integrating novel in silico methods with our existing biologics discovery platforms.

12:15

LUNCHEON PRESENTATION: From Strategy to Scale: Engineering the Future of Biologics with Generative AI

Stef van Grieken, CoFounder & CEO, Cradle

Biologics discovery is being transformed by the convergence of machine learning, automation, and molecular engineering. At Cradle, we are building software tools that enable protein engineers to harness generative AI for designing and optimizing biologics with unprecedented speed, accuracy, and scalability. By integrating AI models into the design–build–test–learn (DBTL) cycle, our platform helps scientists dramatically reduce cycle times during lead optimization, compressing experimental feedback loops. This talk will explore how generative models can be made intuitive, interpretable, and reliable enough to turn complex model outputs into actionable design suggestions.

12:45Luncheon in the Exhibit Hall with Poster Viewing

13:45

Chairperson's Remarks

Olga Obrezanova, PhD, AI Principal Scientist, Biologics Engineering, Oncology R&D, AstraZeneca

13:50

Evaluation of Digital Protein Design Tools in an Industry Setting

Karin Hrovatin, Bioinformatic Scientist, Merck KGaA

Stephanie Linker, PhD, Senior Computational Biochemist, Merck Group

As traditional protein development is resource-intensive, Merck is leveraging digital approaches to design and assess new proteins, increasing the hit rate of laboratory screens. We combine state-of-the-art tools for protein structure prediction (AlphaFold3-type models), representation (ESM), and de novo generation (diffusion models) with classical computational biochemistry and bioinformatics methods. We will present the application of our pipeline and discuss current gaps and potential solutions based on ongoing developments in the field.

14:20

Towards Accurate Biomolecular Modelling and Design with Boltz

Talip Ucar, Founding Member, Boltz

Jeremy Wohlwend, PhD, CTO, Boltz

We present Boltz, a family of open models for biomolecular modelling and design. Combining advances in structure prediction, affinity learning, and generative modelling, Boltz enables accurate reconstruction of molecular interactions and the creation of new functional biomolecules across diverse targets. Experimental validations demonstrating high-affinity binders illustrate the potential of large, open biomolecular models to accelerate molecular discovery.

14:50

End-to-End AI Applications for Advanced Antibody Engineering and Multispecifics Design

Mary Ann Pohl, Director, Alliance Management, Ailux

Artificial intelligence (AI) is transforming antibody discovery and engineering. Ailux's platform synergistically combines the best of our comprehensive wet lab, AtlaX biologics database, and three proprietary AI engines. We will present our latest case studies that exemplify our AI-driven approach to advanced antibody engineering and multispecifics design. This presentation provides our realistic and evidence-based perspective on AI’s impact on developing next-generation antibody therapeutics.

15:20Transition to Keynote Session

15:30 PANEL DISCUSSION:

Future of Biologic Therapeutics: Will Half-Life Extended Peptides Replace Multispecific Antibodies?

PANEL MODERATOR:

Daniel Chen, MD, PhD, Founder & CEO, Synthetic Design Lab

Describe the technology and Data-Engineered Antibodies and Engineered Peptides
Discuss, compare, and contrast data
Discuss forward-looking future applications?

PANELISTS:

Paul J. Carter, PhD, Genentech Fellow, Antibody Engineering, Genentech

G. Jonah Rainey, PhD, Associate Vice President, Eli Lilly and Company

Janine Schuurman, PhD, Biotech Consultant, Lust for Life Science B.V.

16:35Refreshment Break in the Exhibit Hall with Poster Viewing

17:15

Structure-Based Calculations for Predicting Properties and Profiling Antibody Therapeutics

Nels Thorsteinson, Director of Biologics, Chemical Computing Group

We present a method for modeling antibodies and performing pH-dependent conformational sampling, which can enhance property calculations. Structure-based descriptors are evaluated for their predictive performance on HIC and viscosity data. From this, we devised four rules for therapeutic antibody profiling which address developability issues arising from hydrophobicity and charged-based solution behavior, and the ability to enrich for those that are approved by the FDA. Antibody modeling and docking accuracy is assessed and compared to recent ML tools.

17:45

AI-Guided Discovery and Engineering of a Dual-Specific scFv

Ryan Emerson, PhD, Vice President, Data Science, A Alpha Bio Inc.

Dual-specific antibodies are a promising but challenging modality. We demonstrate a combination of experimental and computational techniques to discover and optimise a TIGIT/LILRB4 dual-specific binder. Starting from a synthetic humanoid phage library, we identified compatible scFvs, used multiplexed yeast display affinity data to confirm binding and generate a training dataset, and applied a fine-tuned deep learning affinity oracle for optimisation, yielding a molecule with improved dual-target affinity and developability.

18:15

New Specificities and Ultra-High Affinities: Can Sequence-Trained LLMs Predict Labels They Have Never Seen?

Tzvika Hartman, PhD, Senior Vice President, Computational, Biolojic Design Ltd.

Typical ML models are trained by masking and predicting experimentally determined labels. However, in novel drug discovery the goal is often to design antibodies that are better than previously observed ones or even have entirely new characteristics. In this work, we demonstrate that integrating pretrained LLMs with datasets featuring continuous labels allows prediction of binders with novel specificities and with much better affinities than those seen previously in experiments.

18:45

OpenFold3: A Frontier Model for Biomolecular Structure Prediction

Vinay S Swamy, Computational Biologist, Biomedical Informatics, Columbia University

The OpenFold Consortium brings together academic and industrial teams to build state-of-the-art protein structure and co-folding prediction models optimised for use on commercial computational hardware. We develop fully open sourced models and support creation of new experimental datasets, aiming to build more powerful models that can accurately predict complex systems of significance to life sciences. With our latest release, we aim to reproduce the full scale and training regimen of AlphaFold3 and provide open source model weights, extensive datasets, and a permissively licensed code library for developing novel architectures and custom training pipelines.

19:15Close of Machine Learning for Protein Engineering Part 1 Conference