RUTGERS UNIVERSITY CONDOMINIUM CLUSTER USAGE POLICY

The Amarel cluster and other OARC-managed computing systems each comprise multiple phases of nodes distributed between RU-Camden, RU-Newark, and RU-New Brunswick/Piscataway. Once or twice per year, a new phase of nodes is added to the cluster. These new nodes may have slightly different or greatly different hardware configurations and pricing than nodes from previous phases. Faculty, departments, schools, and research institutes can purchase nodes within the current phase and an owner job queue, or partition, is created to represent the resources purchased. Depending on the hardware configuration of the current phase as compared with any previous purchases made by this owner, the owner may be assigned an additional partition or may have an existing partition expanded in capacity.

Along with the numerous owner partitions, Amarel has one general-use partition (“main” partition) per campus to which all Rutgers University students, faculty, and staff with Amarel accounts may submit jobs, with local campus users having priority over non-local. Access to these general-use partitions is free to the Rutgers community. Jobs in the general-use partition may run on any and all nodes from any and all phases in the cluster that satisfy the job’s requested resources. The general-use partition is primarily a first-come first-served job queue with emphasis on larger multi-node high-performance computing (HPC) jobs over single-core or single-node high-throughput computing (HTC) jobs. The maximum number of running jobs per user in the general queue may vary throughout the day depending on cluster load. This daily dynamic “fairshare” tuning helps ensure the resources are distributed as equitably as possible to all general users. The cluster typically has greater than 95% of nodes in use at all times. General-use partition jobs have a maximum walltime of 336 hours (14 days).

When an owner submits a job to their owner partition, the job will automatically run on the purchased resources that this owner partition represents. Owner jobs are given priority attention by the job scheduler before any general job, may preempt immediately as needed any general jobs running on the requested resources, and cannot themselves be preempted. Preempted jobs are automatically requeued. Owner jobs have a maximum walltime of 336 hours (14 days). The condominium model is designed so that purchased resources feel as local as possible and that each owner has control over who gets to use his or her owner partition. Unused owner resources, when not in-use, are available to users of the general-use partition.

Cluster maintenance is performed by OARC up to 4 times per year: late May (followed by the Top 500 benchmark), early August, late November, and early February. The duration of each maintenance period may range from 3 to 48 hours or longer, depending on the work to be done. We will take the entire system off-line only when necessary. In the event that this happens, all nodes and all storage file systems, may be unavailable during the period. Users will be given 2 weeks notice regarding the maintenance date and duration. All jobs, running and queued, are terminated at the start of the maintenance period. Emergency maintenance such as critical security patches, file system, and network maintenance may be performed outside this quarterly schedule. OARC will coordinate with the its faculty advisory committee and cluster users as these situations occur.

 

Appropriate Use of Amarel Cluster Resources

Be sure to read the system messages (the MOTD) displayed whenever you log-in. Quotas and permissible limits may change as the demands on our resources change.

The login node is a shared resource and should not be used for running computationally intensive or memory intensive applications (that’s what compute nodes are for). Compiling code and running basic research administration tasks is fine, but doing production calculations or analyses must be avoided. This presents a problem because it often impacts the ability of others to use that shared system and it can potentially interrupt availability of the login node. A user doing this will be suspended from access to the login node until (1) that user works with an ACI-REF to determine the cause of the problem, (2) a solution is found, and (3) a staff member is available to reinstate the account.

Don’t create and run programs that waste or abuse system resources. Running distributed computation programs is not permitted on the Amarel cluster. Examples include but are not limited to: the RC5 cryptographic challenge and the SETI@home Project.

In general, be considerate and be honest with yourself. If you think that what you are doing might be unethical or take advantage of other users, either don’t do it or contact us (e-mail: help@hpc.rutgers.edu) for an informed opinion. If you are having problems with other users on the system, please try to inform them politely of the problem and try to work things out. If this proves impossible, contact us at (e-mail: help@hpc.rutgers.edu).

 

Reservation Request Process

In cases where the general queuing system is not adequate, researchers may request a reservation for dedicated or high-priority access to a collection of nodes for a specified period of time. This is appropriate in cases where limits on numbers of nodes, walltime, or resources prevent a job from running. It is generally not appropriate as a means of circumventing the general-use partition in order to have a job complete more quickly (at the expense of other users).

The applicant makes a reservation request at http://oarc.rutgers.edu/reservation-request. Reservation requests from students must have a valid sponsor, advisor, or instructor. The applicant will be notified if the request is approved or denied and if there are any further actions to take. Under normal circumstances, a decision should be reached within 10 business days.

The following details must be provided by the applicant. If a field is left empty, value considered for that field will be “any” or “none” or “not applicable” or “default” as appropriate.

– research abstract

– why is the reservation required

– type of reservation request

– how many nodes

– how many cores per node

– type of core
s

– how much RAM per core

– type of network interconnect

– how many GPUs, if any, and what type

– amount of storage required, if the home directory and/or scratch space is not sufficient

– duration of reservation in hours/days/weeks/months as appropriate

– maximum walltime for jobs

– desired start time of reservation

– list of NetIDs with access to the reservation

Upon request by the committee, the applicant will provide further details on any aspects of the request as needed. Research scientists from OARC are available to assist with filling out the form.

 

Committee Approval Process

With the reservation request details in hand, OARC will determine, to the extent possible, the impact of the reservation request on the cluster, and will present its findings to the committee. The committee will vote to approve or deny the request. A reservation requesting more than 5% of a given resource or lifetime longer than 7 days must be voted on by the committee; smaller requests may be expedited by OARC. If the request is approved, the committee will ask OARC to make the reservation. OARC may require 2 weeks of lead time to make the reservation if the requested resources need to be drained of existing workload. OARC will inform the committee when the reservation is active, and the committee will contact the applicant.

Requests will be approved if the following conditions are met:

  • A quorum has voted, where a quorum is defined as 1/2 of the faculty on the committee
  • A simple majority of those in the quorum voted to approve the request
  • The target number of active faculty members on the committee is 12

A reservation may be delayed if it conflicts with a scheduled cluster maintenance period. 
If the request is denied, the committee will email the applicant with the reasons for rejection and suggestions for future requests or recommendations for other resource use. 
If the committee is unable to reach a decision within 5 business days, the request will be discussed for another 5 days, encouraging all committee members to participate and vote.

 

Types of Reservation Requests

Exclusive-use dedicated resources: For the entire duration of the reservation, the resources associated with the reservation may be used only by the applicant.

Back-fillable dedicated resources: When the applicant is not using the reservation resources at 100% capacity, preemptable general jobs may run on those resources as appropriate.

Special priority partition: Jobs within this partition may not be preempted and will run on non- exclusive resources matching the queue configuration. Jobs within this partition may preempt general jobs.

Special general partition: This partition may be configured with parameters not available within the existing general partition, for example, a longer maximum walltime. Jobs within this partition may be preempted by owner jobs.

Rutgers Industry: This reservation is used for Rutgers faculty doing external, for-profit work. Resources are leased at a discounted rate to be negotiated by relevant parties before the request is approved.

External Industry: External companies may request time on the Amarel cluster. The external company will pay total cost (all equipment, power, cooling, system administration) and indirect. The request will go through the normal approval process, with provisions made to shield the effect of such usage from the general Rutgers user population (e.g., outside company jobs don’t preempt Rutgers user jobs).

External Academic: Users from other academic institutions may request time on the Amarel cluster. As long as such requests do not negatively impact our Rutgers users, this is a good opportunity for outreach and for recruiting future owners. The applicant must have a Rutgers sponsor. If the request does have significant impact on Rutgers users, the request should be brought before OARC.

External Partner: Occasionally Rutgers will want to allow companies with which Rutgers does business access to the Amarel cluster for benchmarking and testing. This activity helps the vendor be proactive in fixing potential issues and Rutgers sees the direct benefit of improved performance and a direct link to the software developers.

Other: Special cases that don’t fit into any category above.

 

Definitions

Rutgers Match: A Rutgers match is when Rutgers picks up the infrastructure costs of buying a node, such as networking, power, cooling, cabling, and racks, while the owner pays for the cost of the node itself.

RU User: Any Rutgers faculty member, staff member, or student.

Owner: An Owner is someone or some entity who has purchased or leased one or more nodes in the Amarel cluster. In order to make Amarel feel as local as possible (i.e. similar to a system managed by the owner), owners receive highest priority with respect to job submissions, system administration, response to ticket and/or email requests or queries, software installation of open source and commercial packages (as long as the user has a license), and consultations. A special owner partition is provided at time of purchase (see sections on Owner Partition).

Owner Group Member RU: An Owner Group Member RU is a Rutgers faculty/staff/student who is part of an owner research group.

Owner Group Member Non-RU: An Owner Group Member Non-RU is someone external to Rutgers University who is member of an owner research group. This person cannot submit jobs to the general-use partition.

External Owner: An External Owner is someone or some entity who has purchased or leased one or more nodes in the Amarel cluster without a Rutgers match. This owner cannot submit jobs to the general-use partition.

External Owner Group Member RU: An External Owner Group Member RU is a Rutgers faculty/staff/student who is a member of an external owner group. This owner may submit jobs to the general-use partition.

External Owner Group Member Non-RU: An External Owner Group Member Non-RU is someone external to Rutgers University who is member of an external owner group. This owner cannot submit jobs to the general-use partition.

 

The Amarel Cluster Queue Configuration

Queue Types

There are two queue types: routing and execution. Routing queues automatically route jobs to execution queues or to other routing queues. Jobs are considered for execution only when they are residing in an execution queue. Jobs may only be submitted to routing queues, jobs cannot be submitted directly to execution queues. An execution queue is one of two types: express or normal. Job in express execution queues can preempt jobs in normal execution queues.

General-Use Partition

The One-Rutgers general-use partition is a routing queue named MAIN(?) and is the default job partition for the entire system. This means if a job is submitted and does not specify a partition name, the job will automatically enter MAIN(?). If an owner job does not specify an owner partition name, the job will automatically enter MAIN(?) as a general job.

Each campus cluster will have a uniquely named routing partition available only to the respective campus community. These partitions will have priority over the general-use partition, but not owner partitions.

MAIN(?) routes jobs to one of nine normal execution partitions based on the job resource request. These execution partitions are ordered as follows, which means the scheduler will try to fit the job into the first execution partition, and if not able to, try the second partition, and so on, until the job enters the appropriate execution partition.

 

If the job does not fit into any execution partition due to the resource request not matching up with any execution partition definition, the job remains in MAIN(?) and will never run. Typically, this means either the user is unfamiliar with how to make resources requests, or is asking for too many resources and such a request might be better handled with a OARC reservation.

Execution partition  min cores  max cores  min GPUs  max GPUs  fairshare mem per core
solo 1 1 0 0 2gb
single 2 16 0 0 2gb
tiny 17 128 0 0 4gb
small 129 512 0 0 8gb
medium 513 2048 0 0 8gb
large 2049 8192 0 0 8gb
gpu_small 1 32 1 4 4gb
gpu_medium 1 256 5 16 4gb
gpu_large 1 1024 17 64 4gb

 

If the job can fit into an execution partition, but the user is already using their fairshare for that partition, the job will be held in the MAIN(?) routing partition until one or more of that user’s jobs in the execution partition has completed. For example, the solo partition accepts one core jobs only.

 

Queue Categories

Each execution partition is assigned to one fixed partition category. Each node is configured to accept jobs from one or more partition categories, depending on the hardware configuration of the node.

The partition categories are:

[[ To be filled in as we learn about the different use cases. ]]

 

All nodes, regardless of hardware configuration, accept jobs from partitions with partition category OARC(?). For all other categories, the node will only accept jobs from partitions with that category if the node’s hardware configuration matches the partition category.

Owner Partitions: Owner partitions are routing partitions that route jobs to express execution partitions that are configured, using a partition category, to run jobs on the type of node purchased and which campus the node is located. Each owner execution partition is sized according to the number of nodes purchased. An owner may have multiple routing partitions depending on the phase in which nodes were purchased.

 

As we build-out the “One-Rutgers” advanced computing environment, these policies will need to be updated and adjusted to address new requirements and concerns. Adding cloud-based resources to the One-Rutgers configuration will increase complexity from an allocation perspective, careful consideration will need to be taken on the adopted priority structure.