RFdiffusion : analysing a full design run

“Here’s one we prepared earlier” - 500 designs using the RFdiffusion binder design workflow of nf-binder-design are in /scratch2/pd27/shared/examples/full_runs/pdl1-rfd.

Challenge - viewing the RFdiffusion pipeline results

View the table combined_scores.tsv - how many designs have a favourable pae_interaction score ? How about if we apply a second threshold on plddt_binder ?

Design run summary and scoring

Designs with scores:

pae_interaction < 10

and

binder pLDDT > 80

are considered to have a higher chance of binding (at <10μM).

You may use additional criteria, but these two scores are the primary filters for selecting and ranking designs.

Other scores

rg - radius of gyration - a quick way to filter out extended designs
- eg long single- or dual-helix designs that often make contact with the target with only a small region of the total binder.
interface_dG - Rosetta interface dG energy - David Baker-associated publications sometimes use this as a filter.

Binder monomer prediction, complex re-prediction

Alphafold ‘initial guess’ is useful because it’s much faster than the full Alphafold workflow for efficiently filtering and ranking designs. However, it makes compromises in accuracy and can be biased by the provided complex structure used as a starting point.

The final step that we’ve not applied in this workshop is ‘full’ structure prediction using using Alphafold (2, or 3, or another predictor such as Boltz-2). Generally, we want to predict, from sequence alone:

a re-prediction of the complex structure
the binder monomer structure

If the target was ‘unnaturally’ trimmed, the complex re-prediction is best done with a complete target domain.

Why predict the binder monomer (from sequence alone) ?

If we are aiming for a stable high affinity binder, we typically want the binder monomer to be the same structure when free and bound (binders that undergo conformational changes upon binding are interesting, but typically not what we are chasing here).

What to do: compare the C\(\alpha\) RMSD of the ‘free’ binder monomer with the ‘bound’ binder. If the RMSD is significantly different (> ~3.5 Å ?) then we will typically discard this binder design.

Why re-predict the binder+target complex structure ?

Alphafold ‘initial guess’ uses the RFdiffusion generated complex as a starting point for structure prediction, and so can be biased toward predicting that binding mode.
Alphafold ‘initial guess’ works in ‘single sequence mode’ without and multiple sequence alignment (MSA) input - this is much faster, but it reduces the accuracy of the target structure in the complex. Running ‘full Alphafold’ (or equivalent) with the MSA input for the target helps confirm the predicted complex.

What to do: compare the C\(\alpha\) RMSD of the re-predicted complex with the initial guess complex (and also the C\(\alpha\) RMSD of target superimposition, and the binder superimposition alone). If the re-predicted complex has changed significantly from the initial guess prediction (> ~3.5 Å ?) then we have low confidence in the original design and will likely discard it.

Reuse

CC BY 4.0