Thursday, June 22, 2006

Non recovery V. Disaster Recovery

Interesting news story from Zdnet in Australia

"The NSW Department of Primary Industries (NSW DPI) has conceded its technology consolidation project is still suffering delays due partly to an incident which saw much of the computer equipment in its facility in regional NSW "slow-baked" in searing temperatures.

Warwick Lill, host systems manager, NSW DPI, said a faulty fire detection system at the start of the year had contributed to the agency's data centre reference implementation (DCRI) project, an 18-month long undertaking, running several months behind schedule.

The problem occurred at NSW DPI's site at Maitland in the Hunter Valley.

"At about 10 minutes to midnight on New Years Eve there was a false alarm in the fire detection system in Maitland, which shut down the air conditioning in the computer room without shutting down the computers," Lill told attendees at Gartner's data centre conference in Sydney.

The state experienced record temperatures on new year's day.

"This wasn't detected until about 11 o'clock in the morning," he said.

"By that time the temperature inside the computer room was up around 70 degrees or something.

"That sort of slow-baked all of the equipment that was in the room, and we're in essence still recovering from that one. So work at the Maitland site in regards to this project has come to a stop until we manage to fix that."

The DCRI project aims to consolidate a large swathe of information technology operations inherited from the NSW DPI's predecessor agencies to sites in Maitland and Orange under moves to generate efficiencies and cut costs. The NSW government created the DPI in July 2004 through the amalgamation of Mineral Resources NSW, NSW Agriculture, NSW Fisheries and State Forests NSW.

"We had too many IT managers, we had too many systems, we had too many ways of doing things," said Lill.

The DCRI project has required extensive planning and preparation in areas such as disaster recovery and system layout.

"We wanted to setup a mutual failover between those two sites, that's the basis of our disaster recovery plan," said Lill.

"We wanted to rationalise the system landscape that we had, so we had dedicated systems for production, user acceptance testing, development, DR [disaster recovery] and so on."

NSW DPI has also acquired a range of hardware and software for the project, primarily from services partner Sun.

"What we've put in, in order of implementation, was a management cluster with dedicated storage at each of our two major sites."

This project was not without its problems, however.

"We moved a little too quickly from the planning phase into the implementation phase, and that combined with the fact that we didn't know quite as much about clustering as we might've needed to do, we didn't quite get the design for the original management cluster right.

"It took a period of time to recognise that and to agree to redo it, so that's put us behind by about three months."

Also implemented were new tape libraries at both sites, and Sun's C2MS (e-mail archiving), SAM-FS (archive management) and Management Centre (system management) software.

The choice of EMC Legato NetWorker as enterprise backup product, however, had been another contributor to delays in the project.

"We've found that the Legato backup software, the database agents for Legato backup software, are not certified for operation in Solaris 10 containers. So at the moment we can't back up our databases if we move them to the application cluster.

"So we're looking forward to a solution to that one in the reasonably near future," he said.

Lill described the work of services partner Sun as "terrific," but conceded there had been early problems in the relationship between the two.

"We bought a design I believe ... whereas Sun sold us a set of hardware and software and professional services during the implementation.

"Initially at least there was a bit of a disconnect between those two, but through communication and influencing and so on we've managed to bring things pretty well together.

"If I was doing it again I think I'd spend more time in the contract negotiation stage looking at the very fine detail of what was proposed."

Lill said he was confident the project would be completed and work well."

Interesting that if he was doing it again the key thing he'd concentrate on is the specification of what he was getting. Having an effective acceptance process and clear acceptance criteria is a key step in getting what you expect when you expect it.

Acceptance criteria and testing

No comments: