Our Amsterdam Clusters have experienced a NVMe failure leading to spontaneous and isolated desynchronisation for some buckets. No data is lost. Synchronisation is ongoing and you might experience write issues during this period
The Cluster is now back to its nominal state
The stabilisation process of the cluster is now over. Thanks to HW (both NVMe and RAM) swaps inside the AMS DC, we've been able to finally establish a better and faster status of the overall impacts.
We are now in the last phase, meaning doing the final interventions on some isolated buckets (<1%) that still have some 500 errors. The scripts have been validated and have started running on those buckets to put them back in their nominal state.
The bucket cleaning/resync that we have performed the last 3 days is over now.
It appears that there are still some isolated sync-related issues for some of your buckets:
Let us stress that your data, even though not highly accessible, is safe.
We, of course, will keep you updated.
Issue is still under investigation, as we discovered another underlying issue while fixing the first one.
Some buckets may still not return any information (404), while some other may report incorrect data synchronisation.
We will update this statut with more details as soon as possible.
The resync process are still in progress. Nearly 75% of the impacted buckets are now back to a normal state. A rough estimation leads us to the end of the day tonight for the last 25%. Please stay tuned. Again, no data is lost and we thank you for your patience.
We are still in the process of repairing the cluster. As said yesterday, no data is lost. The process is slow as we need to check each database. Some buckets have already been restored, the others are on the way. Thanks for you understanding.
Issue has been escalated to local team