Saturday 2nd February 2019

Compute Nodes Network outage on p12 platform

We are noticing outages on multiple C2 and VC1 servers in PAR1.
Some servers are still not reachable, and new actions performed will not complete correctly.


04.02.19 1100Z (1200LT)

There was a hardware issue on some part our infrastructure. Our team at the datacenter managed to fix it and servers should now be available again. If you still experience an issue please attempt to reboot the server from the console panel and do not hesitate to contact our assistance if it doesn't work as expected.

02.02.19 0435Z (1735LT)

Some instances are still not reachable. We are investigating at the moment

02.02.19 0230Z (0330LT)

All impacted servers should be functional now.

A detailed report of an accident will be published during next week.

02.02.19 0010Z (0110LT)

Issue has now been fixed except for a single hypervisor, we are still working on it.
Regarding C2 servers, it will be required to manually reboot them, network will be reacquired on-boot.

following this major outage, a blog post will be published once the root cause and consequence are fully identified and fixed.
First diag reveals that cascading failures occured caused by corrupted requests.

Impact :
API unavailability, network unavailability on a small number of C2 and VC1 instances.

02.02.19 1830Z (1930LT)

Our teams are still actively working on the few unreachable virtual servers.
Issue is very complex and will probably require full availability of our engineering teams to recover all systems.
Some nodes cannot be restored in their current state, further analysis will be achieved on Monday.

Our support team is still fully available if you have any question.
However response time might be longer than usual following tonight's outage.

We are truly sorry for the inconvenience.