Cambridge Healthtech Institute’s 9th Annual

Leveraging Data Science for Enhanced Protein Expression

Turning Data into Predictable Protein Production

17 November 2026 ALL TIMES WET (GMT/UTC)

Protein expression has entered a data-defined era where engineering precision and expression efficiency are powered by high-quality data generation and advanced analytics. Cambridge Healthtech Institute’s 9th Annual Leveraging Data Science for Enhanced Protein Expression conference at PEGS Europe convenes discovery researchers applying high-throughput experimentation, automated workflows, and integrated multi-omics datasets to build predictive models for recombinant protein expression, host optimization, and scalable production. Designed for protein scientists driving discovery and development, this conference emphasizes transforming complex experimental data into actionable insights to accelerate timelines, improve yields, and guide decision-making across the expression pipeline.

Recommended Training Seminar*
Monday, 16 November, 08:30 – 17:00
TS4A: Protein Production 201: Applying End-to-End CEPA Workflow
*Separate registration required. See training seminars page for details. All training seminars take place in-person only.





Tuesday, 17 November

Registration and Morning Coffee

INTEGRATING DATA PIPELINES TO ENABLE FASTER DATA DRIVEN DECISIONS

Chairperson's Welcome Remarks

Patrícia Gomes-Alves, PhD, Lab Head, Animal Cell Technology Unit, Instituto de Biologia Experimental Tecnologica (iBET) , Lab Head , Sanofi Satellite Lab , iBET Instituto de Biologia Experimental Tecnologica

FEATURED SPEAKER: "EXPERT" A Structured Framework for Capturing Protein Expression and Purification Data to Develop Machine Learning Models

Photo of Nicola Burgess-Brown, PhD, Professorial Research Fellow, UCL, London; COO, Protein Sciences, Structural Genomics Consortium , Professorial Research Fellow , Pharma & Bio Chemistry , University College London
Nicola Burgess-Brown, PhD, Professorial Research Fellow, UCL, London; COO, Protein Sciences, Structural Genomics Consortium , Professorial Research Fellow , Pharma & Bio Chemistry , University College London

A lack of consistent, complete, and standardised experimental reporting limits computational prediction of protein expression outcomes and requires extensive dataset curation. We propose a structured template for capturing protein expression and purification data to improve reproducibility, support machine learning applications, and reduce empirical construct screening. The framework prioritises metadata into critical, highly enabling and optional categories, and includes negative data to improve dataset quality. By generating diverse, machine-usable datasets, this approach aims to support generalisable predictive models, establish a trusted protein production repository and accelerate the path from digital design to purified protein.

IVTT-Accelerated Protein Discovery: Generating Fit-for-Purpose AI Training Data

Frederikke Bjergvang Flagstad, Senior Automation Scientist, Cross Modality Workflows, Novo Nordisk AS , Sr Automation Scientist , Cross Modality Workflows , Novo Nordisk AS

Teaching AI models’ protein prediction can be slow and cumbersome, where design cycles (design, make, test, and analyse) often take months and require specialist expertise across multiple teams. The data that research teams have is often not suitable for training models because it is generated to support a specific project and is not standardised. Using IVTT (cell-free protein expression), integrated liquid handlers, and automated data analysis, we streamline the process and generate fit-for-purpose data. In 7 days, we go from DNA to analysed data.

Building the Protein Lab of the Future at AstraZeneca: Automation First Science and AI Ready Data at Scale

Photo of Stan Blein, PhD, Senior Director and Head, Protein Sciences & Analytics, AstraZeneca , Sr Director , Biologics Engineering , Astrazeneca
Stan Blein, PhD, Senior Director and Head, Protein Sciences & Analytics, AstraZeneca , Sr Director , Biologics Engineering , Astrazeneca

AstraZeneca’s Protein Sciences & Analytics is building the lab of the future based on an automation-first operating model and AI-ready data capture. The enterprise-scale blueprint is grounded in harmonized ways of working, small- and large-scale HT expression platforms, and integrated data pipelines. The ongoing transformation has already shown measurable impact on DMTA timelines and first-pass success and aims to be transferable across modalities and sites—offering a clear path to enable faster, data-driven decisions.

Grand Opening Coffee Break in the Exhibit Hall with Poster Viewing

DECODING GENETIC RULES TO BOOST EXPRESSION

High-Throughput Biophysical Data Generation as the Missing Link in AI-Driven Protein Design

Photo of Nikolay Dobrev, PhD, Founder & CEO, Data Powered Therapeutics GmbH , Founder & CEO , Data Powered Therapeutics GmbH
Nikolay Dobrev, PhD, Founder & CEO, Data Powered Therapeutics GmbH , Founder & CEO , Data Powered Therapeutics GmbH

High-throughput biophysical data generation is emerging as the missing link in AI-driven protein design. We present a platform for scalable production of diverse drug target proteins combined with multi-modal characterisation, including nano-DSF, BLI, SPR, and FIDA. Our approach captures protein stability, aggregation, and binding interactions—particularly for VHHs—at high granularity. By generating paired datasets that link sequence, expression, and biophysical behaviour, we expand both the diversity and resolution of training data. This data-centric strategy enables more robust, generalisable, and predictive AI models, helping to unlock design capabilities in data-sparse and challenging regions of protein space.

Nanobody Discovery with AI-designed Soluble Analogues of Membrane Proteins

Photo of Nicolas M. Goldbach, Research Scientist, Lab of Protein Design & Immunoengineering, EPFL Lausanne , Doctoral Student , Lab of Protein Design & Immunoengineering , EPFL Lausanne
Nicolas M. Goldbach, Research Scientist, Lab of Protein Design & Immunoengineering, EPFL Lausanne , Doctoral Student , Lab of Protein Design & Immunoengineering , EPFL Lausanne

Luncheon in the Exhibit Hall with Poster Viewing

BRIDGING COMPUTATIONAL DESIGN AND PROTEIN PRODUCTION

Chairperson's Remarks 

Rivka Isaacson, PhD, Professor of Molecular Biophysics, Department of Chemistry, King’s College London , Professor of Molecular Biophysics , Chemical Biology , King's College London

From Data Science to Fine-Tuning Codon Optimization for High-Yield Protein Production in E. coli

Photo of Greg Boel, PhD, Principal Investigator, CNRS , Principal Investigator , UMR8261 , CNRS / Université Paris Cité
Greg Boel, PhD, Principal Investigator, CNRS , Principal Investigator , UMR8261 , CNRS / Université Paris Cité

Improving protein production is critical in today’s rapidly advancing protein technology landscape. Using experimental data, we developed a codon-efficiency metric that correlates with the levels of native and recombinant proteins in Escherichia coli. Codon content influences protein expression more than mRNA folding, except in the first few codons, by modulating translation and mRNA degradation. An A-rich, G-poor base composition in the first six codons enhances expression and mRNA stability. We integrated these findings into a sequence optimisation algorithm that predicts the expression of a given DNA sequence in E. coli and proposes strategies for high-yield protein production.

Deep Mutational Learning for the Precision Engineering of Enzymes and Biosensors

Photo of Alperen Dalkiran, PhD, Postdoctoral Research Associate, School of Informatics, University of Edinburgh , Postdoctoral Research Assoc , Univ of Edinburgh
Alperen Dalkiran, PhD, Postdoctoral Research Associate, School of Informatics, University of Edinburgh , Postdoctoral Research Assoc , Univ of Edinburgh

Protein engineering increasingly demands methods that can efficiently navigate vast sequence spaces to identify variants with desired properties. We developed an integrated deep mutational learning pipeline combining high-throughput experimental fitness landscapes with protein language models to engineer proteins across two distinct applications: enhancing myoglobin peroxidase activity through electron-hole hopping pathways, and tuning the sensitivity and dynamic range of HucR-based biosensors. In both systems, neural network models trained on sort-seq and EP-Seq data accurately predicted improved variants, achieving up to 100-fold enrichment over random mutagenesis. Our results establish a generalisable framework for accelerating precision protein engineering.

Data Driven Optimisation of High-Throughput Protein Expression Systems

Photo of Julian Englert, MS, Co-Founder and CEO, Adaptyv Biosystems , CoFounder & CEO , Adaptyv Biosystems
Julian Englert, MS, Co-Founder and CEO, Adaptyv Biosystems , CoFounder & CEO , Adaptyv Biosystems

We present our experience expressing over 80,000 AI-designed proteins using cell-free systems, systematically evaluating expression tags, codon optimisation strategies, and system variants. Applying our automated cloud laboratory for high-throughput protein validation we describe integrating these workflows into this fully automated platform capable of expressing arbitrary customer proteins at scales from sub-microgram to milligram quantities.

Refreshment Break in the Exhibit Hall with Poster Viewing

Deep Learning and Mechanistic Models to Optimise Heterologous Protein Expression Titres

Photo of Eran Miller, Co-Founder & CBO, MNDL Bio , CoFounder & CBO , MNDL Bio
Eran Miller, Co-Founder & CBO, MNDL Bio , CoFounder & CBO , MNDL Bio

Panel Moderator:

PANEL DISCUSSION:
Beyond the Bench: Harnessing Data to Advance Protein Science

Nicola Burgess-Brown, PhD, Professorial Research Fellow, UCL, London; COO, Protein Sciences, Structural Genomics Consortium , Professorial Research Fellow , Pharma & Bio Chemistry , University College London

Panelists:

Christopher Cooper, DPhil, Senior Lecturer in Biotechnology, University of Surrey , Senior Lecturer in Biotechnology , University of Surrey

Nikolay Dobrev, PhD, Founder & CEO, Data Powered Therapeutics GmbH , Founder & CEO , Data Powered Therapeutics GmbH

Welcome Reception in the Exhibit Hall with Poster Viewing

Close of Leveraging Data Science for Enhanced Protein Expression Conference


For more details on the conference, please contact:

Mary Ann Brown
Executive Director, Conferences
Cambridge Healthtech Institute
Phone: (+1) 781-697-7687
Email: mabrown@healthtech.com

For sponsorship information, please contact:

Companies A-K
Jason Gerardi
Sr. Manager, Business Development
Cambridge Healthtech Institute
Phone: (+1) 781-972-5452
Email: jgerardi@healthtech.com

Companies L-Z
Ashley Parsons
Manager, Business Development
Cambridge Healthtech Institute
Phone: (+1) 781-972-1340
Email: ashleyparsons@healthtech.com