Amarel login node reboot
Tuesday, January 29, 2019 at 11:58 pm
OARC Research Computing Users,
as you may notice around 6.50pm on Tuesday, Jan 29 the login node of amarel cluster became partially unresponsive due to a high load on the login node.
The server did not shut down completely and was in a process of slow recovery. Our tech team was aware of the situation and monitored the recovery process. We hoped that the server would fully recover in a reasonably short time with a minimal or no impact on the users activity. However, the recovery progressed much slower than was anticipated and to avoid further negative impact on our users we decided to reboot the server at 11pm.
The amarel server is back up now and accepting new jobs.
Unfortunately, all open fastX and ssh sessions got lost as a result of rebooting.
All jobs that were submitted to the queue and running on the server before the reboot should not be affected.
We apologize for inconvenience and for interruption of your research.
Vlad Kholodovych, PhD