Host Server Down Events


This article describes our host server down (HSD) process.

What is a host server down event?

Virtualized servers reside on a physical server called a host (sometimes called a hypervisor). It is SpinUp’s responsibility to manage the underlying hardware (the host). It is your responsibility to manage everything relating to your virtual Cloud Server.

The host hardware, like all technology, will eventually fail, or need maintenance. When it does, the downtime will impact you unless you’ve made the appropriate preparations. When a host experiences a problem that results in it being offline, all Cloud Servers on that host are also brought offline.

Host server down notifications

When a host or hardware failure impacts you, SpinUp sends email to the primary contact on the account to inform you about the HSD event. At that time, engineers are already aware of the problem and are actively engaged in resolving the issue. When the host comes back online, you are notified with a final update.

Actions to take after HSD events

Note
We strongly recommend that you don't make any changes to your affected resources during an ongoing HSD event.

When you receive a notification that a host has been recovered, log into the affected Cloud Server and verify that services have started and that the data has remained intact.

Remember that any configuration changes that were not set to properly persist through a reboot will revert and need to be re-applied. For example, if you created a rule in the Linux iptables configuration but did not save it to become persistent, then this will need to be corrected.

How to avoid future downtime

We recommend that you set up your infrastructure so that there is no single point of failure.

For example, use a Load Balancer to maintain continuity of your service in the event that a host goes down. Even in the case of a single Cloud Server environment, a Load Balancer enables you to retain access to the same public IP address so that you can clone a replacement Cloud Server from a snapshot. To accomplish this with minimal downtime, we also recommend that you set up scheduled snapshots and backups and test both snapshots and backups periodically to ensure you can quickly clone production Cloud Servers or restore data from backups to a new Cloud Server.

Two great articles, Architecting for the Cloud and Server Redundancy go into these topics in more detail.


Related Content