Google Cloud Batch
Fusion simplifies and improves the efficiency of Nextflow pipelines in Google Cloud Batch in several ways:
- No need to use the gcloud CLI tool for copying data to and from Google Cloud storage.
- No need to create custom containers to include the gcloud CLI tool.
- Fusion uses an efficient data transfer and caching algorithm that provides much faster throughput compared to gcloud CLI and does not require a local copy of data files.
- Replacing the gcloud CLI with a native API client, the transfer is much more robust at scale.
Platform Google Cloud Batch compute environments
Seqera Platform supports Fusion in Google Cloud Batch compute environments.
See Google Cloud Batch for compute and storage recommendations and instructions to enable Fusion.
Nextflow CLI
When Fusion v2 is enabled, the following virtual machine settings are applied:
- A 375 GB local NVMe SSD is selected for all compute jobs.
- If you do not specify a machine type, a VM from the following families that support local SSDs will be selected:
n1-*
,n2-*
,n2d-*
,c2-*
,c2d-*
,m3-*
. - Any machine types you specify in the Nextflow config must support local SSDs.
- Local SSDs are only offered in multiples of 375 GB. You can increment the number of SSDs used per process with the
disk
directive to request multiples of 375 GB. - Fusion v2 can also use persistent disks for caching. Override the disk requested by Fusion using the
disk
directive and thetype: pd-standard
. - The
machineType
directive can be used to specify a VM instance type, family, or custom machine type in a comma-separated list of patterns. For example,c2-*
,n1-standard-1
,custom-2-4
,n*
,m?-standard-*
.
-
Provide your Google credentials via the
GOOGLE_APPLICATION_CREDENTIALS
environment variable or with thegcloud
auth application-default login command. See Credentials for more information. -
Add the following to your
nextflow.conf
file:fusion.enabled = true
wave.enabled = true
process.scratch = false
process.executor = 'google-batch'
google.location = '<YOUR GOOGLE LOCATION>'Replace
<YOUR GOOGLE LOCATION>
with the Google region of your choice, such aseurope-west2
. -
Run the pipeline with the usual run command:
nextflow run <YOUR PIPELINE SCRIPT> -w gs://<YOUR-BUCKET>/work
Replace
<YOUR-BUCKET>
with a Google Cloud Storage bucket to which you have read-write access.