Wednesday 12th June 2019

Compute Nodes [Scaleway] Hot-snapshots temporarily disabled in some situations

Description of the issue : We have identified a bug in the snapshot functionality.

When a hot-snapshot is requested (on the web-console or via API), the snapshot process cannot complete properly if the following conditions apply:

  • instance is a virtualized offer of X86 family (GP1, DEV1, START1, VC1, X64*, RENDER)
  • AND instance last start has occured in or after March 2019
  • AND instance is still running today
  • AND volume being hotsnapshotted matched one the these conditions:
    • it was created before june 2018
    • OR it was created more recently but based on a snapshot created before june 2018
    • OR it was created more recently but based on a snapshot that was itself created based on another snapshot created before june 2018
  • AND the hypervisor (that is running the instance) is in a special state

Impact : In such cases the volume is left in "snapshotting" state, the snapshot is also in "snapshotting" state, so not available (it cannot be used to re-create a new volume). The instance is still running fine and can be used normally (no reboot observed). The instance cannot be stopped, rebooted, or any other action requested by the customer itself, it requires manual fixing by Scaleway team

Temporary disabling and fix : We are working on a definitive fix, so that hotsnapshot can be created again in all situations. Until the bug is solved, we have disabled the hotsnapshot feature for instances that match the above conditions : the API will refuse the creation. This will avoid customer's instance to be "blocked" (no action possible by customer). Note that the hot-snapshot feature is still available today for all other instances.

Workarounds :

  1. Standby ("stop-in-place" action in the API) the instance, then perform a snapshot (AKA cold-snapshot), then start the instance again => this workaround is temporary and best suited for a limited number of snapshots (as it is not automation-friendly), NB: until definitive fix, this process will need to be repeated for each snapshot

or

  1. Create a new instance from scratch (using scaleway official images, or using snapshot created after june 2018), manually move data between volumes, delete the old instance, and perform a hot-snapshot on the new instance => this will allow you to create as many hot-snapshots as needed and will not break existing automated hot-snapshotting operations, even before our definitive fix is available