[Maintenance] Unexpected outage - 26th July 2005 @ 16:35
(DOWN)
TullettJ (MoneyAM)
- 26 Jul 2005 17:33
Hello,
We experienced a small outage just after close due to a partial hardware failure.
As the failure was only partial we weren't able to switch that unit over to its backup which meant the site had to go down in order for us to fix it - a situation that isn't ideal.
I'll be working on ways to recover more quickly in the event of another partial outage during the rest of this week.
If you have any questions regarding this, please post below and I'll do my best to answer them.
Well as a previous technical director for a FTSE 50 insurance company with 10,000 users dependent on my mainframe services - I was allowed 1 service interrupt in 12 months, 2 was unacceptable and I would have lost my job.
On the whole I think MAM do a reasonable job in maintainng service levels - cos I understand how difficult it is to keep a service up and running - especially when dependent on others.
However - you do need to test those service recovery plans regularly and make sure that those contingency plans work - otherwise you will get fair criticism.
It's only very rarely that we have any kind of hardware failure (I can think of 3 instances since launch, of which 2 caused any kind of outage during the trading day), and the more frequent problems do seem to revolve around our feed.
With regards to our feed, we are working together with our suppliers to increase our resiliance to problems, but as you'd understand, these things can't be done immediately - and there's a number of stages involved in this process.