1234 Street Name, City Name, United States

(123) 456-7890 info@yourdomain.com

Newton User Guide

* Still under construction, but here’s a little info about Newton:

Newton is a CentOS 5.11 Linux compute cluster known as the IBM iDataPlex Cluster and originating from UMDNJ (intended to support UMDNJ researchers). There is a great deal of useful information about this system here: http://newton.umdnj.edu

Newton is currently configured with
38 CPU-only nodes, each with 8 Intel Xeon E5440 cores + 16 GB RAM
20 CPU-only nodes, each with 8 Intel Xeon E5440 cores + 12 GB RAM
1 GPU node with 8 Intel Xeon E5-2609 cores + 4 GPUs + 32 GB RAM
1 GPU node with 24 Intel Xeon E5-2620 cores + 4 GPUs + 32 GB RAM
2 CPU-only testing nodes, each with 8 Intel Xeon E5440 cores + 16 GB RAM
1 CPU-only large memory node with 32 Intel Xeon E5-4640 cores + 256 GB RAM

Connecting to the Newton cluster

ssh [your NetID]@newton.hpc.rutgers.edu

View information about compute nodes and cluster job queues

Example of using the sinfo command:

sinfo
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
main           up 2-00:00:00      3  down* node[026,037,042]
main           up 2-00:00:00      3   comp node[014,025,040]
main           up 2-00:00:00      3  drain node[035,043-044]
main           up 2-00:00:00      1   resv node036
main           up 2-00:00:00      1    mix node048
main           up 2-00:00:00      7  alloc node[027-030,045-047]
main           up 2-00:00:00     47   idle cuda[001-002],memnode001,node[001-013,015-024,031-034,038-039,041,049-062]
harpertowns    up 2-00:00:00      2  down* node[026,037]
harpertowns    up 2-00:00:00      3   comp node[014,025,040]
harpertowns    up 2-00:00:00      1  drain node035
harpertowns    up 2-00:00:00      1   resv node036
harpertowns    up 2-00:00:00      4  alloc node[027-030]
harpertowns    up 2-00:00:00     29   idle node[001-013,015-024,031-034,038-039]
testing*       up      15:00      1  down* node042
testing*       up      15:00      1   idle node041
nehalems       up 2-00:00:00      2  drain node[043-044]
nehalems       up 2-00:00:00      1    mix node048
nehalems       up 2-00:00:00      3  alloc node[045-047]
nehalems       up 2-00:00:00     14   idle node[049-062]
gpu            up   infinite      1   idle cuda001
gpu_k20        up   infinite      1   idle cuda002
largemem       up 2-00:00:00      1   idle memnode001

Understanding this output:

There are 7 job queues,
main (traditional compute nodes, CPUs only)
testing (traditional compute nodes, CPUs only, intended only for short tests)
harpertowns (traditional compute nodes, CPUs only)
nehalems (traditional compute nodes, CPUs only)
gpu (nodes with general-purpose GPU accelerators)
gpu_k20 (nodes with Tesla K20 general-purpose GPU accelerators)
largemem (traditional compute node with additional memory)

The upper limit for a job’s run time is 2 days (48 hours), but the testing partition has a limit of 2 hours.

Running a serial batch job (only 1 core)

Here’s an example of a SLURM job script for a serial job.
I’m running a program called “zipper” which is in my /scratch (temporary work) directory.
I plan to run my entire job from within my /scratch directory.

#!/bin/bash
#SBATCH --partition=main         # Partition (job queue)
#SBATCH --job-name=zipx001a      # Assign an 8-character name to your job, no spaces, no special characters
#SBATCH --nodes=1                # Number of compute nodes
#SBATCH --ntasks=1               # Number of tasks to run (often = cores) on each node
#SBATCH --mem=2000               # Total real memory required (MB) for each node
#SBATCH --time=02:00:00          # Total run time limit (HH:MM:SS)
#SBATCH --output=slurm.%N.%j.out # Combined STDOUT and STDERR output file
#SBATCH --export=ALL             # Export you current environment settings to the job environment
cd /scratch/[your NetID]
srun /scratch/[your NetID]/zipper/2.4.1/bin/zipper < my-input-file.in

Understanding this job script:

A job script contains the instructions for the SLURM workload manager (cluster job scheduler) to manage resource allocation, scheduling, and execution of your job.
The lines beginning with #SBATCH contain commands intended only for the workload manager.
My job will be assigned to the “main” partition (job queue).
This job will only use 1 CPU core and should not require much memory, so I have requested only 2 GB of RAM — it’s a good practice to request only about 2 GB per core for any job unless you know that your job will require more than that.
My job will be terminated when the run time limit has been reached, even if the program I’m running is not finished. It is not possible to extend this time after a job starts running.
Any output that would normally go to the command line will be redirected into the output file I have specified, and that file will be named using the compute node name and the job ID number.

Here’s how to run a batch job, loading modules and using the sbatch command:

First, be sure to confiure your environment as needed for running your job. This usually means loading the necessary modules, if any:

$ module purge
$ module load intel/16.0.3 cuda/7.5
$ sbatch my-job-script.sh

The sbatch command reads the contents of your job script and forwards those instructions to the SLURM workload manager. Depending on the level of activity on the cluster, your job may wait in the job queue for mintues or hours before it begins running.

Running a parallel batch job (2 or more cores)

Here’s an example of a SLURM job script for a parallel job.
See the previous (serial) example for some important details omitted here.

#!/bin/bash
#SBATCH --partition=main         # Partition (job queue)
#SBATCH --job-name=zipx001a      # Assign an 8-character name to your job, no spaces, no special characters
#SBATCH --nodes=1                # Number of compute nodes
#SBATCH --ntasks=8               # Number of tasks to run (often = cores) on each node
#SBATCH --mem=10000              # Total real memory required (MB) for each node
#SBATCH --time=02:00:00          # Total run time limit (HH:MM:SS)
#SBATCH --output=slurm.%N.%j.out # Combined STDOUT and STDERR output file
#SBATCH --export=ALL             # Export you current environment settins to the job environment
cd /scratch/[your NetID]
srun --mpi=pmi2 /scratch/[your NetID]/zipper/2.4.1/bin/zipper < my-input-file.in

Understanding this job script:

The srun command is used to coordinate communication among the parallel tasks of your job. You must specify how many tasks you will be using, and this number usually matches the –ntasks value in your job’s hardware allocation request. Note that the software you are using must be “MPI-aware” — running a non-MPI program in this way will simply run “ntasks” copies of you program, which will not speed up processing but may have other undesirable side effects.
This sample job will use 8 CPU cores and slightly more than 1 GB RAM per core, so I have requested a total of 10 GB of RAM — it’s a good practice to request only about 2 GB per core for any job unless you know that your job will require more than that.

Here’s how to run a batch job using the sbatch command:

$ module purge
$ module load mpi/mvapich2/2.0.1/intel/15.0.2
$ sbatch my-job-script.sh

Note here that I’m loading a┬ámodule for the parallel communication libraries (MPI libraries) needed by my parallel executable.

Monitoring the status of jobs

The simplest way to quickly check on the status of active jobs is by using the squeue command:

squeue -u [your NetID]

  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
1633383      main   zipper    xx345   R       1:15      1 node036

Here, the state of each job is typically listed as being either PD (pending), R (running), along with the amount of allocated time that has been used (DD-HH:MM:SS).

For summary accounting information (including jobs that have already completed), you can use the sacct command:

sacct

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
1633383          zipper       main      statx          8    RUNNING      0:0

Here, the state of each job is listed as being either PENDING, RUNNING, COMPLETED, or FAILED.