On June 4, 1996, the Ariane 5 maiden flight 501 failed 40s after its takeoff with the explosion of the launcher. The amount lost was $500 000 000, five hundred billion dollars. Uninsured.
Web, what reference shall I put now? There remains the French version of the CNES. (FIXME: Ariane: Heck. ESA seems to have withdrawn the report from the Web, what reference shall I put now? There remains the French version of the CNES..)
The explosion resulted from a software error in the Inertial Reference System (SRI1) software module. Ariane 5 reuses the SRIs and much of the hardware and software from Ariane 4, her elderly sister, and in particular the SRIs modules were kept as is, since they proved to be perfectly reliable over the last ten years.
Nonetheless, 37 seconds after takeoff, the two SRIs software
modules detected an overflow: the horizontal bias of the flight,
measured on 64 bits, no longer fitted in a 16 bit integer. The
exception handling mechanisms were triggered, but since this overflow
was not caught, the default exception handler of both modules concluded
an impossible situation, diagnosing severe problems: they shut down.
This resulted in Ariane veering abruptly, and in the launcher properly
committed suicide. Although the trajectory was perfect.
The SRIs failed during the very first flight of Ariane 5 while they worked perfectly for Ariane 4. How come? Because the trajectories of the two generations of Ariane are completely different.
The overflow was not detected during simulations and testing. How come? Because testing has also been performed using Ariane 4 trajectories.
The SRIs were still running 37 seconds after takeoff while they are useful only before. How come? The two SRIs are normally shut down 9 seconds before takeoff, but in the event that the count down is held just before takeoff, to avoid the additional delays needed to reset the SRIs, which would delay the whole launch for hours, they are kept alive until 50 seconds after takeoff.
They trusted a single pair of modules while everybody knows that in
planes critical systems are tripled and election is performed on the
results. How come? Because they did double the SRIs
modules, but it was a mere duplication, and both twins correctly
detected the failure at the same time.
The recommendations given by the committee of experts which analyzed the failure include:
- improvement of the representativeness (vis-a-vis the launcher) of the qualification testing environment;
- introduction of overlaps and deliberate redundancy between successive tests (i) at equipment level, (ii) at stage level, (iii) at system level;
- switch-off or inhibition of the SRIs alter lift off;
- testing to check the coverage of the SRI flight domain;
- general improvement of representativeness through systematic use of real equipment and components wherever possible;
- simulation of real trajectories on SRI electronics.