Disaster planning involves identifying the failure points in an entire system and then addressing those issue using fault tolerance, backup and response planning. The success of a disaster plan is determined by how quickly and effectively the system can be restored.
Writing a computer disaster recovery plan starts with deciding if failures will be treated on a system or component level. For example, a desktop computer can either be entirely replaced or repaired by diagnosing the failing component. Generally, the most failure prone part of any system are the moving parts. In computers, these items include fan motors and hard disk drive motors.
Part of a disaster plan for computers and phones should also include a maintenance plan. Many of the internal disasters that happen to a business computer system are a result of delayed or overlooked maintenance. Common maintenance items include:
- software updates: weekly or monthly.
- fan cleaning: annually
- hard drives: test annually, with replacement every 5 years or 50,000 power-on hours.
- UPS batteries: test monthly, replace every 2-4 years.
Workstation and laptop hard drives:
Every computer has an internal hard drive that stores all of the programs and data. Hard drives can fail in many ways, either gradually or suddenly without warning. In addition, they are susceptible to viruses and software malfunctions that can disable a station or entire network.
For workstation backup, we recommend installing a second internal hard drive or connecting a USB backup drive and using Ghost to create a single off-line mirror image, or Novastor backup software to create multiple compressed backup files while the system is in use.
While newer solid state storage devices (SSD’s) are more reliable, they are still subject to failure for different reasons. For laptops, replacing a hard disk drive with an SSD provides improved protection from shock and movement damage. When upgrading an existing computer from hard disk drive to solid state storage, retaining the original hard drive along with the SSD can be a useful and effective way to provide a second drive for backup storage in each station.
We recommend Novastor Professional workstation backup software, since it includes all of the features necessary for disaster recovery, including: open file backup, compressed backup files, scheduled backups, bare metal disaster recovery and e-mail notification.
For the ultimate in workstation recovery, Novastor can be configured to copy all files from a primary drive to a secondary drive, along with storing image recovery files in a folder on the secondary drive. The backup copy option ensures the secondary drive is immediately usable during a primary drive failure, while the additional backup files can be valuable for restoring missing or corrupt files.
Server hard drives:
Windows 2000/2003/2008 server software includes disk mirroring as an option. Disk mirroring requires two hard drives. Data is saved to both hard drives simultaneously, but can be read back from the drives separately (called “split seeks”) for improved performance. This provides protection from both disk errors and disk failure in a single drive.
While many high-end servers include hardware features for RAID disk mirroring or striping, we recommend avoiding them due to their complexity. Hardware RAID features also mask the SMART attributes on a hard drive, making it difficult or impossible to diagnose or predict a failing hard drive.
When planning the backup of a server, image backups will provide a faster restore better suited to rebuilding an entire server. Individual file backup will permit restoring files but requires more work when rebuilding a system from an empty hard drive, also known as a bare metal disaster recovery.
In additional to a local USB backup drive that remains connected to the server, we recommend two additional portable USB drives that can be exchanged and removed off-site. These portable drives maintain a copy of all of the backup files.
While off-site backup is possible using an Internet connection, these services create a recurring cost and rely on overnight shipping using a USB hard drive when a full system restore is required.
Backup systems should be regularly monitored for failure. Frqeuently, backup systems are found to have stopped working without notice, resulting in wider data loss. Choosing backup software that provides an e-mail notification is a valuable method for monitoring a backup. In addition to reporting, the actual backup files should be checked to confirm the creation date and size. Significant changes in backup size or duration should be investigated.
Spare computer systems:
Every office should have at least one spare computer and monitor, so that when an existing computer fails and is removed for repair, the spare can be setup as a replacement.
Internal phone systems:
Traditional phone systems rely on a controller or “PBX unit” that is subject to wear and failure. Having a single “POTS” jack available near the phone controller, along with a long cable and handset, can provide a simply and effective work-around to receiving calls while a phone controller is being repaired. In addition, we recommend connecting phone systems to a battery backed UPS to provide power to run the phone system during an outage.
For newer VOIP phones and service, the phones are frequently powered using a Power-Over-Ethernet switch, so connecting both the POE switch and Internet connection (router and Internet service) to a UPS is necessary to provide phone service during a power outage.
We recommend avoiding the use of POE switches, since they are much more expensive than regular Ethernet switches. They typically run warm and have multiple fans due to their high power output, requiring a larger UPS and additional cooling. POE switches are also a significant single point of failure, disabling all phones and computers when they fail.
For existing systems that already have a POE switch, a good disaster plan includes power transformers with battery backup (UPS) for important phones, such as a reception desk.
External phone service:
For traditional analog phone service, an external failure can be caused by an equipment outage at the telephone company central office (CO) or in the lines between the CO and phone system. In either case, phone companies are generally aware of these outages immediately, since the circuits are monitored for disruption. While re-routing phone service to a cell phone or voice mail is possible, there can be delays of up to 24 hours for the phone company to process these requests. We recommend contacting your telephone service provider to review the available options for handling outages in advance of a failure.
Voice-over-IP service (VoIP):
VOIP services rely on an Internet connection instead of a traditional central office for communication. While an Internet outage can disable both voice and data, many VoIP systems include a “network availability number” feature. The VoIP provider can automatically re-direct incoming phone calls to designated phone number, ensuring no calls are lost.
VoIP systems can also be configured to automatically send incoming calls to voice mail when there is an excessive number of incoming calls or unavailable additional lines. These options can usually be found in the web page control panel provided by the VoIP provider.
In addition to backing up the files from a web-site, a common point of failure is the zone file. This is a configuration file that stores names and IP addresses for all of the Internet options for a domain. Zone files can easily become defaulted during a system upgrade, resulting in e-mail and web-site outages.
Simply printing a copy of the zone file can be invaluable in restoring e-mail and web-sites during an outage. This can be done using a zone file utility, or by request from the hosting provider.
E-mail can be stored locally in an office, either on a workstation (typically Microsoft Outlook) or a server. Generally, a backup of the e-mail files is included in any full system backup.
When e-mail is accessed using IMAP4 or web-mail from an external mail server or service (such as Gmail, Yahoo, MSN, etc.) it can be more difficult to arrange a replacement service or restore missing data.