Scores for assessing predicted structures
Scores, so many scores
It’s worth recapping many of the different scores that Alphafold (and other related structure prediction tools) generally use to assess confidence, since these are also key scores that are used for generating and filtering de novo binder designs.
pLDDT - predicted “local distance test”
- based on the lDDT-Cα score for comparing local Cα distances between predicted and experimental structures
- ‘Real’ lDDT-Cα scores are based on the average fraction of preserved distances between each Cα atom in four distance thresholds of 0.5 Å, 1 Å, 2 Å, and 4 Å
- Alphafold2 was trained to predict lDDT-Cα as a measure of local confidence by using real lDDT values between ground truth experimental structures and its predictions.
pAE - predicted aligned error
- the confidence in the relative position of each residue pair in the predicted structure, measured in Ångströms (Å)
- the expected positional error at residue X, if the predicted and ground truth structures were aligned on residue Y.
- Usually visualized as a 2D heatmap of the predicted aligned error for each residue pair.
pae_interaction - inter-chain predicted aligned error
- sometimes called
iPAE, ori_pAE - The average PAE score for inter-chain residue pairs in the PAE matrix, between target and binder.
- Eg, the average pAE every pair that is not and intra-domain PAE score.
- a measure of confidence in the relative arrangement of the target and binder.
- the ‘other’
iPAE
pTM - predicted template modelling (TM) score
an overall measure of structure accuracy - Alphafold2 (
ptm) models, and Alphafold Multimer models, are trained to predict TM-score between the predicted and experimental structures.the pTM score is derived from the values in the pAE matrix
The original non-predicted TM-score is based on the C\(\alpha\) distances between the predicted and experimental structures, with poorly superimposing C\(\alpha\) atoms downweighted, and a scaling factor to normalise for the length of the protein (for the best superimposition, that maximises the TM-score).
TM-score is less sensitive to outliers than RMSD (flexible loops and tails have a smaller impact), and independent of protein length
ipTM - interface predicted template modelling (TM) score
like the pTM score, but considers only pairs of residues between chains, not within chains (inter-chain pairs only)
a measure of confidence in the relative arrangement of the target and binder.
a variant, ipSAE, proposed by Roland Dunbrack, is calculated in a similar way to ipTM but only considers the most confident pAE pairs in the calculation (ef pAE < 10). This reduces the impact of low confidence ‘disordered’ regions on the ipTM score.
C\(\alpha\)-RMSD - root mean square deviation of C\(\alpha\) atoms
calculated between two superimposed structures - the average distance (Å) between all selected C\(\alpha\) atoms in the two structures
can be sensitive to outliers (flexible loops and tails), but often outliers are excluded from the calculation (eg ChimeraX ‘matchmaker’ reports RMSD values for ‘pruned’ and ‘all’ pairs of atoms)
depends non-linearly on protein length (longer proteins will tend to have higher RMSD values)
To date, none of these scores (or others) have been found to strongly associate with binding affinity in the high affinity ranges (<<10 μM) that we are usually interested in.
Many do tend to associate with boolean binding/non-binding (eg predicting less than or greater than 10 μM affinity), so they are still useful for guiding and filtering designs in silico.
Resources
- EBI Alphafold tutorial: good practical overviews of pLDDT, pAE, TM, pTM, ipTM.