Sunday 12th April 2020

Object Storage Object storage- NVMe failure

Our Amsterdam Clusters have experienced a NVMe failure leading to spontaneous and isolated desynchronisation for some buckets. No data is lost. Synchronisation is ongoing and you might experience write issues during this period

===================

23/04/2020 1330Z (1530LT)

The Cluster is now back to its nominal state

21/04/2020 1520Z (1720LT)

The stabilisation process of the cluster is now over. Thanks to HW (both NVMe and RAM) swaps inside the AMS DC, we've been able to finally establish a better and faster status of the overall impacts.
We are now in the last phase, meaning doing the final interventions on some isolated buckets (<1%) that still have some 500 errors. The scripts have been validated and have started running on those buckets to put them back in their nominal state.

16/04/2020 1215Z (1415LT)

The bucket cleaning/resync that we have performed the last 3 days is over now.
It appears that there are still some isolated sync-related issues for some of your buckets:

  • For less than 3% of the buckets: they are being "repaired", these result in 4XX class of errors.
  • For less than 2% of the buckets: they are being resynced, these result in potential and isolated 5XX class of errrors for write/delete actions

Let us stress that your data, even though not highly accessible, is safe.
We, of course, will keep you updated.

16/04/2020 0915Z (1115LT)

Issue is still under investigation, as we discovered another underlying issue while fixing the first one.
Some buckets may still not return any information (404), while some other may report incorrect data synchronisation.
We will update this statut with more details as soon as possible.

14/04/2020 0845Z (1045LT)

The resync process are still in progress. Nearly 75% of the impacted buckets are now back to a normal state. A rough estimation leads us to the end of the day tonight for the last 25%. Please stay tuned. Again, no data is lost and we thank you for your patience.

13/04/2020 1500Z (1700LT)

We are still in the process of repairing the cluster. As said yesterday, no data is lost. The process is slow as we need to check each database. Some buckets have already been restored, the others are on the way. Thanks for you understanding.

12/04/2020 1645Z (1845LT)

Issue has been escalated to local team