BoltzGen Workflow

The boltzgen.nf workflow automates the design of binders using the BoltzGen generative model.
The key advantage of this workflow is improved parallelization across multiple GPUs and better resource allocation on HPC clusters.
It supports the protein-anything, peptide-anything, protein_small-molecule and nanobody-anything protocols.
Overview
The BoltzGen workflow performs the following steps, as per boltzgen run:
- Design: Generates binder backbone structures based on a YAML configuration.
- Inverse Folding: Generates sequences for the designed backbones.
- Folding: Re-folds the designed binder in complex with the target.
- Design Folding: Re-folds the designed binder alone.
- Affinity: (Optional -
protein_small-moleculeprotocol only) Calculates affinity scores for protein-small molecule complexes. - (Merge batches: nf-binder-design
bolzgen.nfmerges independently generated batches of designs, ready for analysis and filtering) - Analysis & Filtering: Aggregates scores and filters designs based on user-defined criteria.
General Information
Command-line Options
You can see available options with --help:
nextflow run boltzgen.nf --help
YAML Configuration
The BoltzGen workflow is controlled by a YAML configuration file passed via --config_yaml. This file defines the input structure, constraints, and design objectives. Refer to the BoltzGen documentation on how to prepare your configuration file.
Note: ⚠️ BoltzGen expects residues specified as indices starting at 1, irresepctive of the numbering in your PDB/mmCIF file. You may find it easier to use
bin/renumber_chains.pyto renumber your input PDB to be sequentially numbered starting at 1 to simplify choosing hotspot residues in ChimeraX/Pymol/Molstar etc.
Usage
Basic Execution
To run the workflow with a configuration file:
nextflow run boltzgen.nf --config_yaml config/my_design.yaml
Key Parameters
--config_yaml: (Required) Path to the BoltzGen YAML configuration file.--outdir: Output directory for results (default:results).--design_name: Name of the design, used for output file prefixes. Defaults to the basename of the config file.--protocol: Protocol type. Options:protein-anything(default)peptide-anythingprotein-small_moleculenanobody-anything
--num_designs: Total number of designs to generate (default: 100).--inverse_fold_num_sequences: Number of sequences to generate per backbone during the inverse folding step (default: 1).--batch_size: Number of designs to process per batch (default: 10).--budget: Final number of designs to keep after diversity optimization and filtering (default: 10).--devices: Number of GPU devices to use (default: 1).--num_workers: Number of DataLoader workers.
Filtering Parameters
--alpha: Trade-off for sequence diversity selection: 0.0=quality-only, 1.0=diversity-only.--filter_biased: Remove amino-acid composition outliers (default: true, use--filter_biased falseto disable).--metrics_override: Per-metric inverse-importance weights for ranking. Format:metric_name=weight(e.g.,'plip_hbonds_refolded=4' 'delta_sasa_refolded=2').--additional_filters: Extra hard filters. Format:feature>thresholdorfeature<threshold(e.g.,'design_ALA>0.3' 'design_GLY<0.2').--size_buckets: Constraint for maximum designs in size ranges. Format:min-max:count(e.g.,'10-20:5' '20-30:10').--refolding_rmsd_threshold: Threshold used for RMSD-based filters (lower is better).
Key Outputs
The results directory (or specified --outdir) will contain:
params.json: A record of the parameters used for the run.boltzgen/batches: Each independent design batch, in folders for each step (design,inverse_folding,folding,design_folding)bolzgen/merged: All batches merged, after the 'analysis' step.boltzgen/filtered: Final designs are infiltered/final_ranked_designsafter the 'filtering' step.
boltzgen/merged should be equivalent to the output of a non-Nextflow execution of boltzgen run - see the BoltzGen docs for details on the pipeline output.
Re-running Filtering
Use boltzgen_filter.nf to re-run filtering on existing results with different parameters:
nextflow run boltzgen_filter.nf --run results/boltzgen/merged --budget 20 --alpha 0.05
The --config_yaml and --protocol are auto-detected from params.json if not specified. All filtering parameters from the main workflow are available.
Results are saved to results/boltzgen/filtered/final_ranked_designs/.
Examples
The examples/ directory contains complete working examples for BoltzGen workflows:
examples/boltzgen-protein: Protein binder design (protocolprotein-anything)examples/boltzgen-cyclic-peptide: Cyclic peptide binder design (protocolpeptide-anything)examples/boltzgen-small-molecule-binder: Small molecule binder design (protocolprotein-small_molecule)
See the examples/README.md for details.