Live migration allows SpinUp to relocate a Cloud Server to a different physical host machine. This process can be necessary for a variety of reasons, and this article aims to inform you of the situations that can lead to a live migration, as well as what it is and how it works.
What is live migration?
Live migration is an automated process that facilitates moving a Cloud Server from one host machine to another. This process is called ‘live’ because it does not require the Cloud Server to reboot or otherwise shutdown. There is no downtime in this process due to the way it is performed. The only interruption of services is a few milliseconds of network disruption. This is quick enough that most applications are unaffected or are able to recover gracefully. In almost all cases a live migration is not noticed from the OS point of view.
Live migration works by copying the server over to another host machine. This starts as a kind of local snapshot on the originating host machine, similar to how Cloud Server Snapshots work. This snapshot is moved to the target host machine, and once the transfer completes, a smaller snapshot is created based on the changes made on the server during the orignal transfer. This delta snapshot is then transfered, and this process repeats as many times as needed until both the local and the destination copy of the disk are the same.
At this point, the information stored in the Cloud Server’s RAM undergoes a similar process, a copy is made, transfered, and then a copy of the delta is made and transfered. Once again this process repeats a few times. However, RAM changes too rapidly to ever have the source and destination copies be in perfect sync. Thus, once the difference between the two copies is small enough, the Cloud Server is ‘paused’ at the hypervisor level. Pausing is similar to putting a PC to sleep or in hibernation, in that changes to RAM are temporarily halted and the contents of the RAM are copied to disk. The similarities end here though, as the copy in this case is carted off to the destination machine and immediately unpaused. This portion of the process is the few milliseconds of network interruption mentioned earlier.
Once the RAM copy is transfered and the destination copy is unpaused, the live-migration is complete. From the OS point of view, the server never restarted, never shut down, and all programs and applications continue to run as if nothing happened. The process is meant to be as seemless as possible.
When is live migration used?
Due to the minimal impact of a live migration, SpinUp primarily uses it to reduce downtime for customers. There are a few different scenarios where this may be necessary, but the primary reasons are for maintenances or emergencies.
For example, let us say that a security patch comes out for one of the items on our host machines. The patch in this example requires a reboot of the host machine. In a typical scenario, if the patch is applied and the host machine restarted, any Cloud Servers that live on it would be down for the duration of the reboot. For a host machine this can be anywhere from 5 to 30 minutes depending on the number of Cloud Servers that need to be started once the host machine itself completes the reboot.
However, this is not how SpinUp performs such patching. Instead, when such a patch becomes available, the host machine is emptied by issuing a live migration against the Cloud Servers that are present. This moves them off the machine with no downtime to you, the end user. Once the host is emptied, the patch is applied and the machine is rebooted. Then, we simply move on to the next host machine and repeat the process. This method allows SpinUp to keep the infrastructure of the cloud up to date, without having to hassle you.
Live migration is also potentially used by SpinUp to minimize the effects of a Host Server Down. Of course, depending on the cause of a host server down, this option may not be available. SpinUp will always live migrate Cloud Servers off of an affected host machine if the issue does not eliminate the possibility of doing so, such as a networking failure on the host machine. In the case that migrations are able to occur, this means minimal downtime in an otherwise disastrous situation. It also means SpinUp can take our customers out of the danger, and then deal with the issue without having to worry about potential data loss.
Risks with live migration
Issues with live migration are extremely rare. SpinUp has tested hundreds of the most common applications and Linux packages, as well as all of the Base Images we provide, with over a 99.99% success rate moving via live migration. In extreme cases, esoteric or custom applications may be disrupted by live migration. SpinUp rarely sees such cases, but recommends architecting your environment with redundancy in order to mitigate this issue should you find it affecting your environment. This method is recommended for many other reasons besides live migration as it can help safeguard against much more common issues as well. Check out our guide on redundancy here.
Can I opt out of live migrations?
In short no. Live migration allows SpinUp to keep its infrastructure up to date, which in turn means heightened stability and security of the platform. As in all versions of computer security, the strength of the security is defined by the most vulnerable device in the environment. For this reason, SpinUp will issue live migrations in order to facilitate updates that require downtime. As previously stated, a live migration in these scenarios helps to eliminate downtime for customer devices.