Noeud:Ordering the Tests, Noeud « Next »:, Noeud « Previous »:Look for Realism, Noeud « Up »:Designing a Test Suite



Ordering the Tests

A test suite is a tool.
A test suite must be designed to assist you when something goes wrong. If you merely append test cases one after the other, then some day you will receive a huge log in which possibly 90% of the tests failed. Obviously some low level routine is not working properly for this configuration, but which one? With which one of the hundred of failing test should you start? What is their most probable common origin?

Tests, within a test suite, shall be built just as the programs themselves: if the program consists of layers of modules, or simply layers of routines, then exercise the low level layers first. In other words, exercise bottom up. Then, chances are high that addressing massive test suite failures from the first failures to the last will be the shortest path to a properly fixed program.

The patience of the user, and the increasing likelihood of his interrupting the test suite, are also to be taken into account. If you have torture tests (and you should) then putting them last diminishes their chances of being run, hence your chances to learn that under extreme conditions your package fails.

Unfortunately the two objectives, ordering programmo-morphologically and usero-impatiencely, are often incompatible since torture tests usually involve as many parts of the software as possible, while bottom-up testing emphasizes single component testing.

Autoconf faces this dilemma. Torture tests are critical for Autoconf, since they are meant to guarantee portability of complex requests across all the exotic systems some users have, and across all the creativity of some maintainers. If some sed's limitations are hit by Autoconf, then it must be known it before some fundamental package such as Emacs is found to be impossible to install on some systems. But these torture test failures are extremely hard to analyze...

In the case of Autoconf, we chose to exercise the most fundamental features first, then the torture tests, and finally automatically generated tests, which are representative of the most typical uses. Up to now this order proved to be efficient, as grave failures are detected early, and it only happened a couple of times that the failure of torture tests be understood thanks to tests run afterwards.