Due to an issue with the cooling systems at one of our datacenter providers, temperature in a specific room exceeded safe operating thresholds. As a safety measure, several servers automatically shut down to prevent hardware damage and data loss.
The incident initially disrupted our Block Storage service, with cascading effects on Instance hypervisors. Downstream products such as Kapsule (our managed Kubernetes product), Managed Databases and Public Gateways were also impacted.
All reference times are in UTC. For people in France, Netherlands and Poland, local time is typically UTC+2.
Tuesday, July 1, 2025
13:33 UTC (15:33 UTC+2)
Internal monitoring detected a temperature rise in one datacenter room in Amsterdam.
14:00 UTC (16:00 UTC+2)
The datacenter provider confirms the outage and reports a cooling failure in the affected room.
14:52 UTC (16:52 UTC+2)
Despite preemptive efforts on our end to reduce the power demand in the affected room, the temperature rose to the point where our Block Storage services became unavailable, leading to issues with Instances, Kapsule, Managed Databases, Load Balancers, and Public Gateways.
15:00 UTC (17:00 UTC+2)
As a precaution, we began shutting down servers to protect customer data and our infrastructure.
16:43 UTC (18:43 UTC+2)
Cooling systems were restored and temperature began to decrease.
18:46 UTC (20:46 UTC+2)
Once temperatures returned to safe operating levels and the cooling systems were confirmed stable, we began safely restarting affected systems.
20:34 UTC (22:34 UTC+2)
Block Storage services were fully back to normal.
21:49 UTC (23:49 UTC+2)
All hypervisors were fully back to normal.
23:56 UTC (01:56 UTC+2)
Most impacted services were back to normal.
Wednesday, July 2, 2025
00:30 UTC (02:30 UTC+2)
Kapsule nodes (Instances) were fully back to normal.
00:43 UTC (02:43 UTC+2)
Public Gateways were fully back to normal.
01:23 UTC ( 03:23 UTC+2)
Managed Databases were fully back to normal.
We are actively working with both the datacenter provider and the cooling system vendor to assess possible upgrades that would improve resilience against extreme weather conditions.