Slurm
Getting Started with Slurm in ExCL with best practice recommendations.
Getting Started with Slurm
sbatch Template
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --mail-type=END,FAIL
#SBATCH [email protected]
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --output=slurm-test-out.txt
#SBATCH --error=slurm-test-err.txt
#SBATCH --partition=compute
#SBATCH --nodelist=justify
GIT_ROOT=$(git rev-parse --show-toplevel)
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
echo Started: $(date)
echo Host: $(hostname)
echo Path: $(pwd)
echo --------------------------------------------------------------------------------
### Setup Environment
### Run Command
echo Run Task
echo --------------------------------------------------------------------------------
echo Finished: $(date)Using the Preemptable GPU Queue (nvidia-long)
nvidia-long)What “Preemptable” Means
Important Cluster Policy
What this means for users
Submitting a Preemptable Job
Basic submission
Making Your Job Requeueable (REQUIRED)
Example job script
Handling Preemption Gracefully (Strongly Recommended)
Example with signal handling
How Requeued Jobs Behave
Monitoring Preemptable Jobs
Check job status
See restart count
View job history
When to Use nvidia-long
nvidia-longSummary
Key takeaways
Last updated
Was this helpful?