Connections +
Feature

Engineering & Design – Preparing for the Worst

We hope it doesn't happen, but if disaster strikes your business, you need to be prepared. Here is a look at what to consider when designing a plan to deal with calamities, mishaps and downright catas...


July 1, 2000  


Print this page

We hope it doesn’t happen, but if disaster strikes your business, you need to be prepared. Here is a look at what to consider when designing a plan to deal with calamities, mishaps and downright catastrophes.

F unk & Wagnalls dictionary defines disaster as “an event causing great distress or ruin”. While this can have a variety of meanings, in the business world it can easily spell “devastation” to a company.

We may not be able to prevent “Acts of God” from happening, but we can certainly develop a plan to cope with them. And when it comes to “disaster planning”, there are really two major aspects to look at: business requirements and technology dependencies.

Before you do anything else, it is very important to get an understanding of a company’s operations and service(s), in addition to its customers and the level of service they expect. This information will help to determine the financial penalty if no plan exists and tailor a disaster plan to meet the needs of the business.

Once you have a clear picture of the business, you can then focus on how it uses technology to delivery its service and help to pinpoint potentially weak links. Below are a few essential items to think about when planning for the worst.

SITE SELECTION

When selecting a business location, there are a few key items to consider. First, does (or can) your building or space obtain electrical power from separate entrance points? If the business is required to provide ’24/7′ service, reliable power is important. You may also wish to check with the hydro company to determine how many outages the building and surrounding area have experienced.

Secondly, if the business has a heavy reliance upon telecommunications services, you may want to make sure you can get these services from two separate telecommunications providers.

Proximity to sources of flooding is another issue to consider. If you are providing critical services you should take a careful look at the surrounding area, the terrain and the age of water and sewage infrastructure in the area. If the building is located in a depressed area of terrain, it is only natural to have rainfall or snow run-off flow towards the building. The surrounding water run-off system should have the capacity and capability to deal with the “storm of the century”.

Most businesses today have some form of call centre configuration to provide customer service. In order to insure around-the-clock service, some businesses have both a primary and a secondary call centre — located in different cities, provinces or countries. In the unlikely event of a disruption in service at the primary location (i.e., a complete power outage or fire), the secondary location can continue to provide service and help maintain customer loyalty.

REDUNDANCY AND DIVERSITY

Since the deregulation of the telecommunications industry, some companies rely on voice and data services from several access and service providers. When using alternative providers, it would be wise to ensure they are using their own facilities and have a separate entrance point. It makes little sense to go through the exercise of securing an alternative provider if they utilize the same entrance pathway, cable plant and central office facilities as the primary provider.

There are several options available to businesses today. One of these, Carrier Redundancy, can include using a second wire-line based service, wireless provider or as simple as using a cell phone.

Another option, Central Office Diversity, involves voice and data services provided from two separate central offices. In the event of a failure in the primary central office, services would be provided by the secondary one.

Typically, if you were to incorporate either Carrier Redundancy or Central Office Diversity you would ensure that their entrance facilities would enter the building at different points (i.e., north and south sides of the building). This eliminates having a cable cut on one side of the building disrupt the services provided from the other.

COOLING AND POWER

Once inside the building and into the respective (or combined) computer/telephone equipment rooms, it is important to have a close look at power and HVAC (Heating, Ventilation and Air Conditioning) requirements.

When planning the HVAC for a computer room, you may want to split the load requirements across two A/C units. While this will increase the initial capital cost and the on-going maintenance costs, it does ensure that if one A/C unit fails there would be sufficient cooling provided by the other to allow for the continual operation of mission-critical systems. It is a good idea to ensure that A/C units are able to accommodate at least five years of anticipated growth.

Most medium to larger sized private branch exchanges (PBXs) are capable of utilizing DC (Direct Current) power. This is the preferred power as it is cleaner and more stable than AC (Alternating Current). Having said that, most computer rooms today use some form of uninterruptible power supply (UPS) system. (For more information, please see UPS Markets Surge Ahead on p. 54).

UPS systems are a great way to “clean” and stabilize commercially provided power. As processors continue to operate at increasingly faster speeds, it becomes even more critical to ensure that the power source for these devices is a reliable as possible. Failure to do so can cause logic errors to occur within the processor and cause either process failures or system collapse.

In the event of a commercial power failure, most UPS systems are also capable of continuing to provide power, as each is generally equipped with batteries. The number of batteries the UPS system is equipped with, and the load placed upon it, will determine how long the systems will continue to operate. It is typically recommended that the batteries be capable of supporting the load for a minimum of 15 minutes. This allows sufficient time for each of the respective processors and systems to be powered down in a controlled fashion and minimize data corruption. This is particularly critical with UNIX servers.

PBXs have typically been equipped with, and are capable of supporting, services for a period of two to four hours or longer. This is sufficient time to invoke any sort of ‘Business Resumption Plan’ (i.e., redirecting staff and services to an off-site location) and begin to assess system recovery time.

Depending upon the environment, each site should be equipped with a back-up source of power in the form of a generator (particularly hospitals, banks, casinos etc.) These EPS (Emergency Power Systems) are capable of providing power for days, months or longer. The time limiting factors usually include fuel availability and the overall condition of the generator.

It is highly recommended that the entire system of EPS, UPS and batteries be tested at least once a year. Some companies and institutions even test these systems weekly.

DATA BACK-UPS

On the operational side of things, it is very important to have a regular back-up process for all servers and systems, including the PBX. Typically, partial database back-ups are automatically performed nightly with full system back-ups taken on the weekend. These tapes should be stored in a fire-proof vault or, preferably, off-site.

As for the PBX, the only time data typically changes is when either an upgrade has occurred or MAC (Moves, Adds and Changes) activity has taken place. Upon completion of these activities, the system profile should be saved to a back-up set of disks or tapes and stored off-site.

BACKBONE CABLING

Depending upon the structure of the building and the nature of the occupant, it may be advisable to run redundant backbone cabling to each respective Telecommunications Room (TR), each taking a diverse route. Like entrance diversity, this would ensure uninterrupted communications between the equipment in the computer room and the equipment in each of the respective TRs. If something happened to the primary cable, service could continue via the back-up cable.

With some networking e
quipment today, both the primary and secondary cabling can be connected directly. This allows for immediate switchover to the secondary signal source. However, even if the equipment does not have this feature, service can be restored relatively quickly by manually relocating the patch cords at both ends, from the primary to secondary cabling.

EQUIPMENT DIVERSITY

Depending upon the network configuration and the equipment used, redundancy can be built in. This can include load sharing/balancing power supplies, “hot-swappable” (power can be on during replacement) power supplies, and controller and peripheral cards. These all allow for quick service restoration to the user or server community.

With respect to power in the TRs, it is recommended that multiple dedicated outlets be provided for networking equipment only. These outlets should be on separate phases within the electrical service panel. On occasion, a phase can lose power; by having a second outlet on a different phase, the equipment will continue to operate, particularly in the case of redundant power supplies. At the very least, power can be quickly restored by manually moving the power plug(s).

In some environments, particularly in the case of trading floors, each trader position generally has duplicate cable terminations from duplicate and diverse TRs. This is certainly the extreme. However, when it means potentially making or losing millions of dollars a second, financial institutions can easily justify this extravagance.

TO ERR IS HUMAN

At the end of the day, disaster planning really comes down to several things: How important is it to your business? How much can you afford? And can you afford not to plan for the inevitable?

Remember, human beings are not perfect, and any thing we make will at some point certainly fail. If we can imagine it, it can happen. But if it does, at least you have a plan. Right?CS

Mark Maloney, RCDD is a Senior Consultant at Ehvert Technology Services, a professional technology services company in Toronto.


Print this page

Related