DJ Boarding System Problems

pauly7 · Nov 10, 2012

Queues at Virgin were horrifying! Jetstar too? QF fine.

greenfish · Nov 10, 2012

Knock-on effects are still ongoing. My 5pm flight to BNE has just been delayed to 8pm. Joy.

cmon0005 · Nov 10, 2012

This is why I try to keep away from LCC's where possible (such as VA, Jetstar, Tiger) yes it is true that any system can have problems, but there has to be good robust manual processes in place and plenty of proactive announcements to keep people informed, judging by some family friends who were travelling today on VA, and the VA facebook its not hard to see that hardly any info is been given to people in the airports.
For example have a look at Syd airports departure timings for today (both domestic & Intl), you will will find more of VA's flights were delayed by 2 - 2 1/2 hours each on average, and Jetstar who were also down average delay was an hour.
Even their own website says bugger all except a couple of lines.
Travel Alerts | Virgin Australia

eastwest101 · Nov 10, 2012

cmon0005 said:
For example have a look at Syd airports departure timings for today (both domestic & Intl), you will will find more of VA's flights were delayed by 2 - 2 1/2 hours each on average, and Jetstar who were also down average delay was an hour.
Even their own website says bugger all except a couple of lines.
Travel Alerts | Virgin Australia

Yes - considering the Sept 2010 problems may have cost VA around $20M (may be a bit overblown IMO) - then even if todays disruptions only cost $1M in extra wages, lost revenue, planes and aircrews being out of position, back office manual paperwork, cancellations, refunded exit row fees, and aircraft stranded once the SYD curfew kicks in tonight etc etc and considering that VA knows that Navataire can fall over at the drop of a hat then I guess they can justify at least $1M training everyone to use Sabre? And to train everyone in manual backups if Sabre ever falls over?

Mak404matt · Nov 10, 2012

Plutonus said:
The comments on Virgin's Facebook status regarding this have made my day.

Some new contenders for Hayden's Whingers of the Week!

pshepvic · Nov 11, 2012

Anyone know I the problems are still going today? I'm supposed to be flying back from ADL and will try to get to the airport for an earlier flight if there are still delays.

Sent from my iPhone using AustFreqFly app

oz_mark · Nov 11, 2012

bullerdude said:
Update: Apparently a Power Failure at a data centre in Sydney is the cause of everyone's pain this morning.

Why a single power failure to cause so much havoc I don't know....

My understanding this morning is that while the UPS was working fine, the air conditioning failed which led to some shutdowns.

pshepvic · Nov 11, 2012

oz_mark said:
My understanding this morning is that while the UPS was working fine, the air conditioning failed which led to some shutdowns.

Cool thanks - will keep an eye on the flight info!

Sent from my iPhone using AustFreqFly app

markis10 · Nov 11, 2012

Large lines at SYD as they play catchup this morning.

pshepvic · Nov 11, 2012

I'm already checked in online - here's hoping for an op-up at the gate!

Sent from my iPhone using AustFreqFly app

djchuckles · Nov 12, 2012

Worth adding that the boarding pass didn't have my velocity number on and flight hasn't shown up in my points.

When I tried to claim missing points it said I didn't board the flight - gonna have to ring I think

eastwest101 · Nov 12, 2012

Maybe the approach to VA could be - "if you have the records of the booking being made, then prove to me that I did not board the flight, otherwise its VA's responsibility to credit the correct SC and pts". I am sure once you call them the pts and SC will eventually be credited, I would imagine it will be a very slow and drawn out proccess though as they will have to cross check all bookings and missing points claims with the manual paperwork.

djchuckles · Nov 12, 2012

I can't see it being a major issue - they will know it was when the systems were down

simongr · Nov 12, 2012

samh004 said:
You'd have thought they'd have a second data centre somewhere else that could be switched to...



oz_mark said:
All systems have troubles from time to time. That's why they have manual processes in place.

Given that they are so dependent on their systems to operate I am gobsmacked that they don't have a Disaster Recovery functionality that can switch on a "replacement" system much more quickly.

eastwest101 said:
Yes - considering the Sept 2010 problems may have cost VA around $20M (may be a bit overblown IMO) - then even if todays disruptions only cost $1M in extra wages, lost revenue, planes and aircrews being out of position, back office manual paperwork, cancellations, refunded exit row fees, and aircraft stranded once the SYD curfew kicks in tonight etc etc and considering that VA knows that Navataire can fall over at the drop of a hat then I guess they can justify at least $1M training everyone to use Sabre? And to train everyone in manual backups if Sabre ever falls over?

What about the people that "will never fly Virgin again" or will see VA as an LCC because of this? Brand damage is a bigger issue than the immediate costs.

cmon0005 · Nov 12, 2012

simongr said:
What about the people that "will never fly Virgin again" or will see VA as an LCC because of this? Brand damage is a bigger issue than the immediate costs.

But they already are a LCC (Just a LCC with a FF programme and a lounge)

anat0l · Nov 12, 2012

simongr said:
Given that they are so dependent on their systems to operate I am gobsmacked that they don't have a Disaster Recovery functionality that can switch on a "replacement" system much more quickly.

I wonder if the fault is rooted at Virgin or at Navitaire (just like Qantas' would be with Amadeus). If the GDS is at fault there's not much to be done, because you would then be demanding that the GDS company make the appropriate redundancy infrastructure; even if you paid the costs for it (capex & opex) they might still be reluctant to do so. I'm guessing that one might not be able to copy off the data from the GDS as one's own form of redundancy.

Of course, GDSes have contracts with stated performance with airlines (I would assume); given large enough damage due to a non-functioning system, the GDS company could be liable for damages. It certainly won't make up for the brand damage, but then again the only solution would really be to run your own GDS completely in-house (where you have near full control over both the software and hardware systems).

Some days the Swiss cheese holes line up.

simongr said:
What about the people that "will never fly Virgin again" or will see VA as an LCC because of this? Brand damage is a bigger issue than the immediate costs.

Happened before; didn't see VA's reputation go into the toilet that time - why this time?

And people "never fly x airline" ad nauseum... most of them, if they weren't hypocrites, would not be flying ever again so help me God based on their strict word.

NM · Nov 12, 2012

simongr said:
Given that they are so dependent on their systems to operate I am gobsmacked that they don't have a Disaster Recovery functionality that can switch on a "replacement" system much more quickly.

A DR plan normally takes considerable time to enact. The term used for rapid fail-over to an alternate location/system is generally "Business Continuity". The technical requirements for a BC plan are generally a lot more complex than for a DR plan. In many cases, a DR plan may have a RTO (Return Time Objective) in excess of 24 hours, and an RPO (Recovery Point Objective) of 8-12 hours. So a traditional DR plan may well be in place but is not going to be activated unless there is a declared disaster event which implies a return to normal operations is at least 24 hours away.

A BC plan requires different design. For example you need to have the data available in both location in "almost" real time. That requires data storage replication/synchonisation between the hosting locations and intelligence in the management of active application hosting. Yes, this can be done and in done by many organisations. But it comes with a significant cost overhead compared with a traditional DR plan. If there are manual processes available, it may well be considered acceptable risk to involve manual processes during an outage rather than paying the ongoing operational costs to implement a hot-standby processing centre with a Business Continuity plan.

simongr · Nov 12, 2012

We have RPOs as low as four hours. We also have systems that are replicated in real time so we have actually switched servers (simulating a failure of one) and the business didnt notice.

I used DR as a generic term as the failure of a single system would often not be a DR or BC situation. It is also something that I find business do not plan for adequately - they invest many $ on BCP or DR to cover the entire business or get up and running when a building burns down but don't much of a plan when Joe spills a cup of coffee over the server hosting the core applications.

NM · Nov 12, 2012

simongr said:
We have RPOs as low as four hours. We also have systems that are replicated in real time so we have actually switched servers (simulating a failure of one) and the business didnt notice.

Yes, I have designed systems with even lower RPO (1 hour for some systems), but RTO still generally remains much higher. I think the lowest RTO I have seen for a true off-site DR soluition is 12 hours and that was just for the core application with other apps being 24 to 48 hours in a prioritised list.

But in the case of the recent Navitaire system failure, how long was the system outage? Unless they have a hot-standby processing centre, its unlikely the recent event would have pulled the trigger on activation of the plan to move processing to an alternate location.

simongr said:
I used DR as a generic term as the failure of a single system would often not be a DR or BC situation. It is also something that I find business do not plan for adequately - they invest many $ on BCP or DR to cover the entire business or get up and running when a building burns down but don't much of a plan when Joe spills a cup of coffee over the server hosting the core applications.

Well I certainly hope that Joe is not taking his coffee into the data centre computer room environment ...

Component-level failure should be covered by local redundancy. So an outage that takes out an entire application/system for several hours should be caused by a spilled coffee or even a single hardware failure. But we all know that no matter how well you design a system, there will always be unexpected failure modes that result in unexpected outages of some form.

So the end result is that it most likely is possible to design a high-availability solution that would have provided (almost) continuous operations for this system. However, the cost to implement and operate the HA solution may not be justified against the cost and risk of such a failure. It all comes down to risk management and ultimately that is a $$$ judgement call.

simongr · Nov 12, 2012

NM said:
Well I certainly hope that Joe is not taking his coffee into the data centre computer room environment ...

How else is he going to drink his coffee whilst making his noodles for lunch? You would be surprised at some of the recommendations I have made - including "don't use the computer/server room to store waste cardboard and flammable chemicals"

DJ Boarding System Problems

Senior Member

Active Member

Established Member

Established Member

Member

Established Member

Enthusiast

Established Member

Veteran Member

Established Member

Established Member

Established Member

Established Member

Enthusiast

Established Member

Enthusiast

Enthusiast

Enthusiast

Enthusiast

Enthusiast

Become an AFF member!

AFF forum abbreviations