DJ Boarding System Problems

Status
Not open for further replies.
Yes, I have designed systems with even lower RPO (1 hour for some systems), but RTO still generally remains much higher. I think the lowest RTO I have seen for a true off-site DR soluition is 12 hours and that was just for the core application with other apps being 24 to 48 hours in a prioritised list.

But in the case of the recent Navitaire system failure, how long was the system outage? Unless they have a hot-standby processing centre, its unlikely the recent event would have pulled the trigger on activation of the plan to move processing to an alternate location.

Well I certainly hope that Joe is not taking his coffee into the data centre computer room environment ...

Component-level failure should be covered by local redundancy. So an outage that takes out an entire application/system for several hours should be caused by a spilled coffee or even a single hardware failure. But we all know that no matter how well you design a system, there will always be unexpected failure modes that result in unexpected outages of some form.

So the end result is that it most likely is possible to design a high-availability solution that would have provided (almost) continuous operations for this system. However, the cost to implement and operate the HA solution may not be justified against the cost and risk of such a failure. It all comes down to risk management and ultimately that is a $$$ judgement call.

It's great to talk about all of this, to a degree. Reminds me a lot of all the theory they taught us in my BInfTech about redundancy, architecture and risk management. Of course, in the days that I did my degree, management of IT infrastructure was predominantly biased towards lowest cost without much regard to reliability or minimisation of failure (e.g. single point of failure / accountability). Don't know if the culture has changed in years.

Back on topic, we'll probably never really know whether the failure could've been covered by adequate infrastructure (i.e. inadequate management of risks), or whether this was just a "Swiss cheese holes" incident. On top of that, there is the contractual obligation between Virgin and Navitaire for uptime - if Navitaire had written in anything less than 100% uptime in their service contract, that does give them a "leeway" for failure, even though we all know that failure is quite unpleasant, as just witnessed.
 
Sponsored Post

Struggling to use your Frequent Flyer Points?

Frequent Flyer Concierge takes the hard work out of finding award availability and redeeming your frequent flyer or credit card points for flights.

Using their expert knowledge and specialised tools, the Frequent Flyer Concierge team at Frequent Flyer Concierge will help you book a great trip that maximises the value for your points.

How else is he going to drink his coffee whilst making his noodles for lunch? You would be surprised at some of the recommendations I have made - including "don't use the computer/server room to store waste cardboard and flammable chemicals"

I'm surprised you didn't call in the OHS authorities really. That's bordering on negligence.
 
Back on topic, we'll probably never really know whether the failure could've been covered by adequate infrastructure (i.e. inadequate management of risks), or whether this was just a "Swiss cheese holes" incident. On top of that, there is the contractual obligation between Virgin and Navitaire for uptime - if Navitaire had written in anything less than 100% uptime in their service contract, that does give them a "leeway" for failure, even though we all know that failure is quite unpleasant, as just witnessed.
Depends completely on the cause of the outage, as I expect there elements that are outside Navitaire's control that are excluded from uptime measurements on the service level agreements. Was the failure within the data centre or related to the communications services between the Navitaire data centre and the application users at the airports? If so, are the communications services covered by the service contract or is an event initiated by a JDCF (John Deere Cable Finder) not the responsibility of Navitaire? I assume we won't know the answers to these questions and hence cannot really comment on the event with any factual knowledge unless on or other party makes a public statement about the root cause of the service failure, which I expect is highly unlikely to happen.

But speculation can make for interesting on such forums ;)
 
How else is he going to drink his coffee whilst making his noodles for lunch? You would be surprised at some of the recommendations I have made - including "don't use the computer/server room to store waste cardboard and flammable chemicals"
I probably would not be all that surprised ... and can probably share a few good stories myself ;)
 
But speculation can make for interesting on such forums ;)

Reports are that several AHUs tripped. Outages didn't occur until about an hour after the initial power failure.
 
I probably would not be all that surprised ... and can probably share a few good stories myself ;)

I dont see any issue with putting back up tapes in a fire proof safe...............
 
The Frequent Flyer Concierge team takes the hard work out of finding reward seat availability. Using their expert knowledge and specialised tools, they'll help you book a great trip that maximises the value for your points.

AFF Supporters can remove this and all advertisements

I dont see any issue with putting back up tapes in a fire proof safe...............
One of the best I came across was a production data centre that was diligent about testing the power backup system regularly. They tested it monthly and every time the diesel generator started and kicked in well before the UPS batteries became critical. Then one day they had a real power failure at the facility and the generator would not start and the UPS batteries kept things going briefly before failing. It turns out that the electric fuel pump for the diesel engine was not on the UPS and when the mains power failed so did the fuel pump! Ooops, a slight oversight in the design and testing procedure.
 
Status
Not open for further replies.

Enhance your AFF viewing experience!!

From just $6 we'll remove all advertisements so that you can enjoy a cleaner and uninterupted viewing experience.

And you'll be supporting us so that we can continue to provide this valuable resource :)


Sample AFF with no advertisements? More..

Recent Posts

Back
Top