Management Transition & Current Status:
The management and operations responsibilities for Caliburn were transferred to OARC on July 31st. At the time of this transition, Caliburn was down for an extended maintenance window between allocation periods with only basic service/administrative systems running (i.e., not operating in production mode). With Caliburn offline, it is not possible to estimate the systems operational capacity (deliverable compute service units, SUs) and preparations are underway to begin that determination of Caliburn’s capacity and performance characteristics.
Aug. 26, 2019: An extended power outage on one of our campuses over the weekend disrupted power for the entire Caliburn system. When power was restored, it was found that Caliburn’s power and cooling management systems were not configured for automatically returning to service. This created a situation where recharging of the UPS system generated unabated heat in Caliburn’s data center, demonstrating that upgraded (automated) power and cooling control systems would be required for ongoing operation of the Caliburn system. That upgrade process is currently underway and, consequently, Caliburn remains offline.
Dec. 20, 2019: Between August and the end of September, the technical team at OARC took time to research the system and to prepare a plan of action that would return Caliburn to production. OARC system administrators accessed and installed monitoring systems and tested their processes.
The Module Data Center (MDC) preparation should be complete by mid-January to allow Caliburn to return to full production by January 29, 2020. The high level task schedule appears below:
MDC Caliburn Readiness Plan
|Caliburn Compute Production Readiness||87.25d||09/30/19||01/29/20|
|Chiller Plant Controls Enhancement Options||77d||09/30/19||01/14/20|
|MDC Infrastructure Monitoring||40d||11/18/19||01/10/20|
|Configuration Performance Testing||3d||12/17/19||12/19/19|
|Caliburn System Readiness||20d||12/20/19||01/16/20|
A more detailed view of the MDC Caliburn Readiness Plan can be viewed here.
Jan. 30, 2019: The Modular Data Center that houses Caliburn will be upgraded with new cooling system components early next week. In preparation for this important upgrade, we will need to extend the return to normal operations until Monday, Feb. 17th.
Access to Storage System:
Sept. 24, 2019: We’ve been able to get infrastructure work schedules aligned sufficiently to temporarily enable access to Caliburn’s storage system. That system is now available for access via SFTP only. Our team will keep these systems up and accessible between 12:00pm ET Monday and 3:00pm ET Friday this week ( Sept 23- Sept 27) and next week (Sept 30 – Oct 4).
project /gpfs/gpfs/project1/<project-id> staging /gpfs/gpfs/staging/<project-id> scratch /gpfs/gpfs/scratch/<username>
Sept. 22, 2019: Our infrastructure team and the OIT facilities / data center team are still working to repair and install components needed to stabilize Caliburn. We are optimistically hoping to be able to offer access to data stored in Caliburn’s storage systems in approximately two weeks (approximately mid-October).
Sept. 3, 2019: Returning Caliburn’s storage system to service so researchers can access and transfer data is currently a top priority. OARC’s infrastructure team is working closely with OIT to find a solution that will enable keeping the machines comprising Caliburn’s storage system running while the data center and components of Caliburn undergo maintenance.
Current Round of Submitted Proposals:
Sept. 22, 2019: We are very disappointed to report that, at this time, Caliburn is not expected to be safe to operate in production-mode for months. Much of the infrastructure acquisition and installation work required is similar to that encountered when initially building a system of this kind: it will take time to purchase, install, and configure the required components.
Sept. 3, 2019: The proposals currently in review will remain in an “in review” status until Caliburn’s operational capacity has been determined. Once Caliburn’s compute and storage capacity are known, we can begin allocating requested resources in an effort to approve as much of the proposed work as possible.
For computational work that cannot be accommodated by Caliburn, open-access to OARC’s Amarel system may serve as a useful alternative. Use of the Amarel cluster is free for all Rutgers students, staff, and faculty. Proposal submission is not required for using Amarel. A brief description of how you plan to use the system is all that is requested when completing our access request form: https://oarc.rutgers.edu/access
Further Questions or Need Help:
Please contact OARC’s research support team at firstname.lastname@example.org