[Serverless Functions/Containers] [fr-par] abnormal 503 errors when calling custom domains
Incident Report for Scaleway
Update
The issue has been escalated to networking team to investigate possible connectivity issues between hosts.
Posted Jun 19, 2024 - 17:30 CEST
Identified
We have identified the issue. Inside our infrastructure, some TCP connections are terminated unexpectedly, leading to 503 for clients doing HTTP calls using these connections. This only affects custom domains because traffic is routed differently from default endpoints. On the user side, retrying in case of 503 should help to mitigate the issue, as we have seen it is unlikely that TCP connections for 2 consecutive HTTP requests break.

Our monitoring have shown this affects around 100 custom domains, for 0.19% of total requests. For most affected clients, the rate of 503 can go up to 2%, but we have seen it can fluctuate over time.

We are still not sure about the root cause, but are working on it. Sorry for any inconvenience.
Posted Jun 18, 2024 - 09:38 CEST
Update
We are still investigating.

It has been confirmed by our tests that only calls to custom domains, in http and https, might periodically end up in 503 errors. These 503 errors have the following body: "upstream connect error or disconnect/reset before headers. reset reason: connection termination".

The 503 errors are sporadic, but are likely to happen in batches as the global load on our infrastructure (number of requests/number of connections) increase. We have some hypothesis to test before communicating further.

As a reminder if you are affected: if possible, you can use the default provided endpoint (*.functions.fnc.fr-par.scw.cloud) instead of your custom domains. If not possible, retrying in case of 503 is unfortunately the only way to mitigate the issue while we are investigating.

Sorry for any inconvenience.
Posted Jun 13, 2024 - 17:44 CEST
Update
We are still investigating. There are still a few 503 returned when calling custom domains. From what we have seen, calls with HTTPS are more likely to end up in 503 errors. Sorry for any inconvenience.
Posted Jun 12, 2024 - 14:12 CEST
Investigating
Some fr-par clients (1/10th of all clients) calling their functions/containers through a custom domain might encounter an abnormal number of HTTP 503 errors. It seems to only affect HTTP calls to the custom domains, and not calls made directly to the default endpoint, but we are still investigating.

From what we have seen so far, for those clients, there should be less than 4% of 503 errors. Though, this number can evolve through time (sometimes it's less than 0.1%).

If possible, clients experiencing these 503 errors can try to use the default provided endpoint instead of their custom domains (*.functions.fnc.fr-par.scw.cloud). If not possible, retrying in case of 503 is the only way to mitigate the issues while we are investigating.

Sorry for any inconvenience.
Posted Jun 07, 2024 - 19:01 CEST
This incident affects: Elements - Products (Functions and Containers).