Cambridge Healthtech Institute's 8th Annual

Leveraging Data Science for Enhanced Expression and Production

Implementing Data and Models to Streamline the Process

11 November 2025 ALL TIMES WET (GMT/UTC)

The growing demand for recombinant proteins is driving the integration of data science with engineering strategies to optimize hosts and production workflows. This includes target gene verification, codon optimization, vector design, and clone/host selection in parallel with exploring high-throughput expression systems, data-driven design strategies, and workflow automation. Each requires careful analysis of complex variables. Cambridge Healthtech Institute’s 8th Annual Leveraging Data Science for Enhanced Expression and Production conference at PEGS Europe convenes protein and data scientists pioneering deep learning applications to enhance cell line engineering, protein expression, and scalable production strategies to streamline experiments and reduce time and costs.

Tuesday, 11 November

07:30Registration and Morning Coffee

BUILDING AND LEVERAGING EXPRESSION PREDICTION MODELS

08:25

Chairperson's Welcome Remarks

Helena Maja Firczuk, PhD, Group Leader, Protein and Cellular Sciences, GSK

08:30

FEATURED PRESENTATION: FAIR Data to Predict Recombinant Protein Expression

Lovisa Holmberg Schiavone, PhD, Director, Protein Sciences, Structure & Biophysics, Discovery Sciences, R&D, AstraZeneca

We have leveraged internal recombinant protein production data that adhere to the F.A.I.R. guiding principles and large external datasets from the SGC to build a predictive model of E. coli-based protein production; RP3Net (Recombinant Protein Production Network) together with the EMBL-EBI. The model has been tested on a set of 46 proteins that were curated from the human proteome, avoiding proteins with prior published evidence of successful expression.

09:00

Utilising Learnings from High-Throughput Protein Expression Platforms to Enhance Delivery of Fit-for-Purpose Reagents

Helena Maja Firczuk, PhD, Group Leader, Protein and Cellular Sciences, GSK

I will present an overview of GSK's high-throughput expression platforms and the advantages of employing them, such as streamlining delivery of protein reagents, enabling multiparameter optimisation of complex reagents generation. In addition, they also enable collecting vast amounts of well-curated data for human and machine learning. This data was used to build and parameterise a model to predict protein expression in various systems.

09:30

Closed Loop Autonomous Learning for Protein Engineering

James D. Love, PhD, Vice President, Cross Modality Workflows, Novo Nordisk AS

Closed loop autonomous learning for protein engineering is a vision of a possible future that may connect AI to physical hardware, resulting in experimental design, execution, and analysis, while minimising human input. This is attractive, as it offers the possibility of more rapid discovery and development of therapeutic lead candidates. This talk will present the ongoing work at Novo Nordisk and our collaborators, and present some finding and future directions.

10:00 Enabling Drug Discovery through Innovation in Protein Purification and Screening Technologies 

Andreas Kiessling, Project Manager, Cube Biotech GmbH

At Cube Biotech, we make proteins accessible and high-throughput compatible. This presentation highlights our screening capabilities, including Strep-Tactin XT, Ni-INDIGO magnetic beads, and the Rho1D4 tag system, as well as our NativeMP platform, which allows unlocking challenging targets like GPCRs, ion channels, and transporters. Our automation-ready technologies for soluble and membrane proteins accelerate workflows from target ID to screening with consistency.

10:15 High-Throughput Signal Peptide Engineering: A New Frontier for Efficient Antibody Engineering

Tero-Pekka Alastalo, CEO, Avenue Biosciences Inc.

Monoclonal antibodies are top-selling biologics, yet efficient production remains challenging due to secretion complexity. Using machine learning and high throughput chemistry, we performed a multi-thousand signal peptide screen to identify the optimal match for multi-chain proteins. Systematic engineering enhanced antibody expression when compared to industry standard while maintaining quality. Our technology offers a first-to-market solution to overcome challenges in protein secretory pathway and enhancing production and development of antibody-based biologics.

10:30Grand Opening Coffee Break in the Exhibit Hall with Poster Viewing

INTEGRATING LEARNINGS FOR PROTEIN FORM AND FUNCTION

11:15

The SGC and Target 2035: Generating Proteins and Ligands to Enable Machine Learning

Nicola Burgess-Brown, PhD, Professorial Research Fellow, UCL, London; COO, Protein Sciences, Structural Genomics Consortium

The SGC, a global public-private partnership, uncovers novel human biology through structural genomics and chemical biology approaches. Target 2035 aims to develop tool molecules for every human protein by creating massive open datasets of high-quality protein-small molecule binding data, using DNA-encoded libraries and affinity selection mass spectrometry platforms. Models built from these data will allow prediction of new and more drug-like small molecule binders, which will be tested experimentally.

11:45

Severe Deviation in Protein Fold Prediction by Advanced AI: Case Studies

Jacinto López Sagaseta, PhD, Head, Protein Crystallography and Structural Immunology Unit, Navarrabiomed

Artificial intelligence and deep learning have significantly advanced structural biology, achieving unprecedented accuracy in modeling folds directly from amino acid sequences. Despite these advances, deviations from empirical structures are not uncommon, and experimental determination of protein folds remains vital for the advance of structural biology and biomedicine.

12:15 LUNCHEON PRESENTATION: Navigating Sequence Space: A DoE and Machine Learning Strategy for Antibody Optimisation

Claes Gustafsson, Co-Founder, ATUM

ProteinGPS is ATUM's protein engineering platform that couples Design of Experiment (DoE) and machine learning to navigate the vast potential sequence space. Drug development historically focuses on optimising one feature at a time. Our holistic, data-driven approach optimises multiple performance and developability properties simultaneously. By running smaller screens, and then learning from each iterative cycle to create the next set of variants, ProteinGPS overall accelerates development and produces molecules with a better balance of critical quality attributes. We will present case studies demonstrating significant, multi-property improvements generated much more efficiently than traditional methods.

12:45Luncheon in the Exhibit Hall with Poster Viewing

DECODING GENETIC RULES TO BOOST EXPRESSION

13:45

Chairperson's Remarks

Nicola Burgess-Brown, PhD, Professorial Research Fellow, UCL, London; COO, Protein Sciences, Structural Genomics Consortium

13:50

Sequence-to-Expression Optimisation with Machine Learning

Diego Oyarzún, PhD, Professor of Computational Biology, University of Edinburgh

Thanks to progress in high-throughput DNA synthesis and sequencing, artificial Intelligence and machine learning have emerged as leading approaches for building sequence-to-expression models for strain optimisation. In this talk, I will discuss our recent progress on using this technology for designing novel regulatory and coding sequences with improved expression phenotypes, using a combination of supervised learning and optimisation algorithms.

14:20

Decoding the Rules of Genetic Syntax to Improve Transgene Design

Jarrod Shilts, PhD, Group Leader, ExpressionEdits Ltd.

Despite recent advances in our understanding of genetic features that promote robust protein expression, transgenes in biotechnology have remained largely unchanged for decades. Natural human genes are rich in intron sequences that can drive these crucial expression benefits, but were previously difficult to replicate in artificial transgenes. At ExpressionEdits, we're changing this by deciphering ‘genetic syntax’ using high-throughput screening and machine learning to design intronised transgenes with improved protein expression.

14:50

A Genetic Cure to Cell Line Instability

Louise Lindbaek, PhD, Team Lead, CHO Cell Line Engineering, Enduro Genetics ApS

Manufacturing proteins in CHO and microbial cells faces challenges in maintaining high cellular productivity over many cell divisions. We have developed a plug-in gene technology that prevents cell line production instability. The plugins link cell growth to antibody secretion using biosensors that regulate essential genes in CHO cells and beyond. This technology enables stable production and supports continuous manufacturing, improving scalability and commercialisation of antibody therapies.

15:20

Selected Poster Presentation: AI In-Silico Screening Improves the Success Rate of Recombinant Protein Production

Evgeny Tankhilevich, Machine Learning Data Scientist, Industry Partnerships, EMBL-EBI

15:50Refreshment Break in the Exhibit Hall with Poster Viewing

ALIGNING DATA AND BIOLOGY FOR INNOVATIVE R&D

16:35 Removing Remaining Limitations for Plant-Based Expression of Biologicals and Unlocking Large-Scale, Commercial-Grade Cell Culture Potential

Christian Sievert, Head of Strain Development, eleva GmbH

Plant-based cell lines present a promising alternative for producing biological therapies, yet the number of approved plant-made pharmaceuticals (PMPs) remains limited. Moss-based expression systems show potential due to their precise genetic modifications, complex post-translational capabilities, and controlled cultivation environments. Recent research has led to an enhanced moss cell line with improved growth rates, aiming to achieve CHO-like expression levels and enabling scalability without the need for artificial lighting.

16:50 Unlocking a New Era of Automated Protein and Antibody Purification with AmMag Quatro Solutions: Scalable, Flexible, Validated

Luciana Rosselli Murai, PhD. Head of FAS for Products and Instruments, GenScript

Efficient protein and antibody purification is vital for drug target discovery, candidate development, and fundamental research, yet traditional chromatography can be slow, laborious, and limited in yield. The AmMag Quatro Automated System, with versatile magnetic beads, enables mini to midi-scale purification directly from cell culture, delivering high yield, reproducible results, flexible protocols, minimal hands-on time, and accelerated productivity.

17:05

Connecting Data, People, and Process: The Digital Transformation of Protein Production

Dominik Schneider, PhD, Senior Manager, R&D Enabling Technology, CSL Behring Innovation

Discover how CSL Behring’s digital ecosystem accelerates scientific progress in protein production. This presentation offers an overview of integrated digital tools and platforms that streamline workflows, enhance collaboration, and improve data management. Through real-world examples, learn how optimised processes drive key performance indicators, enabling more efficient, scalable, and innovative R&D efforts that support breakthrough therapies and advance biopharmaceutical development.

17:35 Design of Experiments (DoE)-Based Process Optimization in CHO Fed-Batch Cultures

Neha Mishra, Senior Scientist, Revvity

CHO cells remain the industry standard for biologics production, yet increasing titer demands require continuous process optimization beyond traditional one-factor-at-a-time (OFAT) approaches that miss critical parameter interactions. In this presentation, we explore design of experiment (DoE) as a strategy to optimize CHO fed-batch culture conditions. We will present the concept of using DoE and how it can be integrated with multivariate analysis (MVA) to understand the combined effects of metabolites, viability, and cell density on culture performance. This approach could provide a robust framework for process development teams seeking to maximize CHO culture performance while streamlining laboratory operations. The CHOSOURCE™ Platform is available for research, clinical, diagnostic, and commercialization applications, including services, under specific licenses from Revvity.

17:50 FEATURED PANEL DISCUSSION:

Beyond the Bench: Making Data Work for Protein Science

PANEL MODERATOR:

Nicola Burgess-Brown, PhD, Professorial Research Fellow, UCL, London; COO, Protein Sciences, Structural Genomics Consortium

Data scientists view data in black and white, while protein scientists consider the grey.   Hear from both disciplines as they address:     

  • Can we enhance protein production using machine learning?     
  • What are the main challenges?   
  • What data to capture, in what format, and for what purpose?    
  • How do we simplify data capture to encourage data entry and consistency?    
  • How do we reduce the need to curate, “clean up”, the data before applying ML?    
  • What is enough data for protein production to apply ML algorithms?    
  • The importance of including negative data!​​
PANELISTS:

Christopher Cooper, DPhil, Founder, Protein Sciences, Enzymogen Consulting

Lovisa Holmberg Schiavone, PhD, Director, Protein Sciences, Structure & Biophysics, Discovery Sciences, R&D, AstraZeneca

James D. Love, PhD, Vice President, Cross Modality Workflows, Novo Nordisk AS

Diego Oyarzún, PhD, Professor of Computational Biology, University of Edinburgh

18:35Welcome Reception in the Exhibit Hall with Poster Viewing

19:35Close of Leveraging Data Science for Enhanced Expression and Production Conference





No Agenda API URL configured.