Top 8 Data Center Disasters That Can Destroy Your Business
1. Natural Disaster
We prepare and plan for the worst, but sometimes it’s not good enough. Tornados, hurricanes (Remember Katrina?), earthquakes, floods, and more are not all that uncommon. You may have the best people and the best equipment, but natural disasters can shut down your data center in many ways; direct damage to the data center, employees no able to get in or out, power loss, network connectivity damage , and more.
Most hardened data centers have a geographically different fail-safe location for this very reason. Depending on the criticality of your data and its access, make sure that you understand the potential threat to your operations from natural disasters, both locally and down the line from your operations.
2. Bad Migration
Have you ever tried to change the tire on a moving car? Crazy you say? Well when relocating a live data center from one location to another, this is exactly what is happening. Too many issues can pop up, leading to disaster. Worse yet is when one failure leads to another then another and then yet another. This compounding factor can bring even the best planned data center move to a grinding halt. Engineers are tasked with planning and executing data center relocations while maintaining proper up-time. This is no small feat. It definitely cannot be taken lightly.
3. Building Disasters
Leaking roofs, fire, accidental sprinkler discharge, or HVAC malfunctions can all cause a data center to stop briefly or even for extended periods of time. In one recent incident, a workman accidentally hit a sprinkler head with a ladder on the floor above the data center. The water discharge and the resulting leak into the data center, racks, and equipment below caused immediate and unforeseen shutdown. Too many data center managers focus on the equipment and room itself, but can easily forget other parts of the building that can affect the operations. Don’t think locally in terms of your area, but rather think globally in terms of your entire physical structure and its surroundings.
4. Power Failure
If the power goes out, no big problem. The backup generator will cover things. Right? Maybe! When is the last time the generator was run. Will it handle your equipment and the needed cooling? Even with UPS, other battery backup, and generators, power and fuel are limited. During major storms, power to some areas can be down for days and even weeks. Consider shifts to consolidate equipment and run priority systems on the backup power. This will draw less energy and related cooling. It might not solve the entire situation, but can certainly help to minimize it.
Again, a remote fail-safe location in a distinctly different geographic area can help minimize threats from power failures. Even small companies who rely heavily on the data transaction processes need to consider such alternatives. A simple option could be multi-site data replication among branch locations.
5. Human Error
The people factor in the data center is one of the strongest in ways and one of the weakest in others. Human error can often be so simple, yet cause such an impact. Unplugging the wrong equipment. Failing to complete patches or installs. Cutting the wrong cables. Such everyday common tasks can be a potential disaster in an operating data center. Proper procedures and quality control checks can help to minimize the human factor, but will most likely never eliminate it.
Documentation, procedures, training, and double checks, and quality control steps are vital necessities in order to minimize human error. Following Information Technology Infrastructure Library (ITIL), the most widely accepted approach to IT service management (ITSM), will ensure that you follow best practices. This consistent well thought out approach is invaluable.
6. Network Outage
Although rare, network outages do occur. Switching centers can go down. Cables can be cut. Having redundant network data lines can help strengthen the network fabric of your data center. These redundancies can help ensure a backup path in the event of the main data path failure. Prepare and test your network connectivity backup plans. Make sure that auto-switching and traffic sensing functionality are working correctly. Remember that a broken backup or failsafe routine is just as valuable as none at all.
7. Data Corruption
Silent data corruption can occur through bad I/O algorithms at the I/O interface level. Ensure that the proper checksums are used to verify read write operations. Don’t scrimp and buy the cheapest cards and drives. Make sure that they are rated for the heavy and critical use they will see. Even simple hard drive failures can add up. Ensure that damaged RAID drives are replaced and rebuilt as soon as possible. Using the proper RAID level for the requirements of the task are just as important. It can be bad if you cut corners on the RAID level simply for speed only to lose one too many drives.
8. Hacking and
Malicious attacks by hackers are a modern reality. There are bad people in the world who enjoy doing bad things. Data centers are prime targets. The deeper the impact, the bigger the thrill is for hackers. Physical and electronic security are major factors in avoiding damage to your data by hackers. Not only do you need to ensure that your firewalls are secure and that your building access is locked down properly, but you must also ensure that such things as data media backup and archive tapes are transported away securely and even with encryption. Access to one lost LTO data tape with customer information, credit card, and social security numbers is not only a security nightmare, but a public relations nightmare as well.
Regardless of what the potential threat is, data center security is no accident. It requires planning, testing, executing, and revising consistently to keep up-time up and down-time out.
Added on 05/14/2012
Backup and Recovery
Disaster Recovery (DR) by Blog