After investigation, we have found the root cause.
During the scheduled maintenance https://status.scaleway.com/incidents/p2cj27y80n9w, starting on 03/03 at 3:30 PM UTC, we experienced a lot of pressure on our infrastructure, saturating the number of connections of our LB. As a result, health checks struggled to complete, leading to our backends marked as "down", "up", "down", "up", etc., while they were healthy.
When "down", no more requests could be done to our infrastructure, leading to TLS errors. These "down" phases lasted a few minutes each time, and were frequently interleaved with "up" phases, leading to a partial disruption. All requests made during "down" phases were rejected. Situation improved during the night (03/03 10 PM -> 03/04 6:30 AM), as we were receiving a fewer amount of requests.
On 03/04 15:10 UTC, rebalancing our backends and closing idle connections freed capacity and solved the issue, as now, backends were always "up", so ready to receive traffic.
We will add more monitoring on our LB so it doesn't happen again.
We apologize once again for any inconvenience.
Posted Mar 09, 2026 - 19:59 CET
Update
As per the last message states, we'd like to emphasize that the incident is now over. However, we will keep it open in "Monitoring" status until we find the real root cause.
The incident lasted between:
- 03/03 3:30 PM UTC to 03/03 10 PM UTC - 03/04 6:30 AM UTC to 03/04 3:10 PM UTC
So, around 15h10m in total, during where 5xx errors, connections closed, TLS issues happened.
We will provide updates and close this incident once the root cause has been found. We have a few leads, but we are checking the different hypothesis.
Posted Mar 06, 2026 - 11:29 CET
Monitoring
Following the recurrence of intermittent errors earlier today, we have taken several mitigation steps: - 14:22 CET: Reloaded gateways - 15:24 CET: Rebalanced node workloads - 16:10 CET: Forcefully cleared stuck connections.
These actions appear to have stabilized the system. At this time, disruptions in the nl-ams region have returned to nominal levels, and error rates are within normal bounds.
Note: The connection cleanup at 16:10 CET may have caused brief 502 errors visible in your Cockpit for a short period. These are expected and should not persist if your services are healthy.
We are now closely monitoring the environment for any signs of elevated errors or latency. Our team remains on high alert and will act quickly if further issues arise.
We sincerely apologize for the repeated impact and thank you for your patience as we work to ensure long-term stability.
Posted Mar 04, 2026 - 16:44 CET
Identified
We are currently observing new occurrences of the reported errors. Our team is actively investigating the situation and working on resolving the issue.
Posted Mar 04, 2026 - 13:59 CET
Monitoring
The situation has improved following the mitigations we implemented. We are now closely monitoring the system to ensure stability across Serverless Containers in the nl-ams region.
If you are still experiencing issues, please reach out to our support team so we can assist you promptly.
Posted Mar 04, 2026 - 11:08 CET
Update
Early monitoring shows some signs of improvement.
We've applied a mitigation and are currently evaluating its impact.
We will provide further updates as more information becomes available.
Posted Mar 04, 2026 - 10:48 CET
Investigating
Since the conclusion of yesterday's maintenance (https://status.scaleway.com/incidents/p2cj27y80n9w), we have been made aware of intermittent errors affecting Serverless Containers in the nl-ams region. Users are experiencing issues such as TLS handshake failures, 502 errors, and connection timeouts (EOF), along with increased request latencies.
We sincerely apologize for the disruption and any impact this may have on your services. We are treating this with the highest priority and will provide updates as we make progress.
Posted Mar 04, 2026 - 10:03 CET
This incident affected: Elements - AZ (nl-ams-1, nl-ams-2, nl-ams-3) and Elements - Products (Serverless Functions).