To determine the number of GPUs your account can access in a SLURM-managed cluster, follow these steps:
Check Account and Partition Access: Use the command sacctmgr show associations
to view your account’s associations with partitions. Look for GPU-specific partitions (e.g., gpu
or gpu-guest
).
Inspect Node Configuration: Run scontrol show nodes
to see the GPU configuration of nodes in the cluster. Look for lines like CfgTRES=gres/gpu:X
, where X
indicates the number of GPUs available per node.
Query Resource Limits: Use sacctmgr show qos
to check Quality of Service (QoS) limits for your account. This may include GPU limits.
Contact Cluster Admins: If you’re unsure about your GPU allocation, reach out to your cluster administrators. They can provide specific details about your account’s GPU access.
Output explanation
Let’s interpret an example output
Cluster Account User Partition Share Priority GrpJobs GrpTRES GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin
---------- ---------- ---------- ---------- --------- ---------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- -------------
slurm root 1 normal
slurm root root 1 normal
slurm_clu+ root 1 normal
slurm_clu+ root root 1 normal
This output appears to be a result of querying Quality of Service (QoS) or resource allocation details in a SLURM-managed cluster. Here’s an interpretation of the columns and key rows:
Explanation of Fields:
- Name: The name of the QoS configuration (e.g.,
normal
,dgx2q-qos
, anddefq-qos
). - Priority: Priority of jobs submitted under the QoS. A priority of
0
indicates no special prioritization. - GraceTime: The time allowed for a job before it is preempted, if preemption is enabled (here it’s
00:00:00
, meaning no grace time). - Preempt / PreemptExemptTime / PreemptMode: Related to job preemption settings (none specified in the table).
- Flags: Special flags applied to the QoS (e.g.,
cluster
in all rows indicates resources are managed at the cluster level). - UsageThres / UsageFactor: Utilization thresholds and scaling factors for resource usage (defaults are shown here).
- GrpTRES: Group-level Trackable Resources (TRES), such as GPUs, CPUs, or memory allocations.
- GrpTRESMins, GrpTRESRunMin, GrpJobs: Limits on resource usage over time or for running jobs.
- MaxTRES: Maximum number of trackable resources (e.g.,
4
indgx2q-qos
anddefq-qos
, which likely refers to GPUs). - MaxWall: Maximum wall time for jobs under this QoS (not specified here).
- MaxJobsPU / MaxSubmitPU: Limits on the number of jobs a user can run or submit simultaneously (not specified here).
- MinTRES: Minimum resources required for jobs under this QoS (not specified).
Key Observations:
- The
normal
QoS doesn’t have aMaxTRES
value specified, meaning there may not be GPU access under this QoS. - The
dgx2q-qos
anddefq-qos
configurations both have aMaxTRES
of4
, which likely indicates a limit of 4 GPUs accessible for jobs submitted under these QoS settings. - No other strict resource usage or job limits are explicitly defined in this table.
“access denied” when trying to run sinfo
If you’re seeing “access denied” when trying to run a SLURM command like sinfo
, it likely means you don’t have the necessary permissions to query the SLURM system or that SLURM isn’t set up correctly for your user. Here’s how to troubleshoot it:
✅ 1. Check SLURM installation
Run:
which sinfo
- If this returns nothing, SLURM tools may not be in your
PATH
. - Fix: Load SLURM module (if using environment modules):
module load slurm
✅ 2. Use sinfo
with elevated permissions (if allowed)
Some clusters restrict SLURM tools to specific user groups or roles. If you’re a new user:
- Ask your cluster admin to confirm that:
- Your user account is added to the proper group (e.g.,
slurm
,hpcusers
) - You have read access to SLURM configuration files (e.g.,
slurm.conf
) - There are no node- or partition-level access restrictions
- Your user account is added to the proper group (e.g.,
✅ 3. Try with scontrol show partition
Sometimes sinfo
might be restricted but scontrol
is more verbose and can still be accessible:
scontrol show partition
✅ 4. Check node access with squeue
You can see if you’re able to run:
squeue -u $USER
If this works, SLURM is functioning, but access to other commands may be restricted.
❗ If nothing works:
Ask your cluster administrator or support team:
- Whether your user account has been fully set up
- Whether SLURM commands are restricted to specific users or groups
- If partitions/nodes are restricted to specific projects or labs