HILL DATA CENTER EMERGENCY: Amarel cluster emergency shutdown due to cooling failure
Due to cooling equipment failure in the Hill data center, we must start draining and powering off Amarel nodes in Piscataway. Once we get enough nodes down that stops the room temps from increasing, we can return some to service.
All users will likely be affected. You may continue submitting jobs but the PD (pending) time will be longer. At this moment we don’t have a defined time for a resolution. The tech team from the vendor is anticipated to be on the site tomorrow, July 6, 2023, for further investigation.
Thank you for your understanding and cooperation.
Vlad Kholodovych, PhD
Interim Director, Research Support / Senior Scientist