Introduction

In the previous publication [1] we have formalized the operational side of the algorithm-specification problem. Now, we elaborate on what it means when one says they want to define an algorithm. In the most common sense, a program specification procedure usually takes the form of setting restrictions that are implied onto the algorithm's behaviour; thus, creating an equivalence class of programs, constricted by the same set of rules.

Formal specifications 101

When one says that ss is a sorting algorithm over a sequence of Γ\Gamma elements according to the strict order \prec, the expected Ms:ΓΓ\mathfrak M_s : \Gamma^* \rightharpoonup \Gamma^* behaviour can be formalized as following:

  1. xΓ,y=Ms(x)\forall x \in \Gamma^*, \exists y = \mathfrak M_s(x), i.e. Ms\mathfrak M_s is total;
  2. xΓ,y=Ms(x),γΓ,{i=1}{x}[xi=γ]={i=1}{y}[yi=γ]\forall x \in \Gamma^*, y = \mathfrak M_s(x), \forall \gamma \in \Gamma, \displaystyle\sum_\{i=1\}^\{\left| x \right|\} [x_i=\gamma] = \displaystyle\sum_\{i=1\}^\{\left| y \right|\} [y_i=\gamma], i.e. output of Ms\mathfrak M_s is always a permutation of the input (written using Iverson bracket [2] notation);
  3. xΓ,y=Ms(x),i,j1,,x,yiyji<j\forall x \in \Gamma^*, y = \mathfrak M_s(x), \forall i,j \in \\{1, \ldots, \left| x \right|\\}, y_i \prec y_j \rArr i < j, i.e. an output of Ms\mathfrak M_s is always sorted.

These three formulas of the first-order logic together exhaustively describe a set of all sorting algorithms. In other words, every program that implements all three can be called a "sorting algorithm", and every sorting algorithm must implement all three.

The most important consequence of creating an equivalence class like the above is the ability to dismiss every implementation detail of a specified algorithm in the context of reasoning about its properties. One can design a complex program p(,s,)p(\ldots, s, \ldots), embed arbitrary sorting algorithm ss as a subroutine, and ensure that if ss' is a sorting algorithm, then the reasoning about the properties of p(,s,)p(\ldots, s', \ldots) will be exactly the same. It is important to understand though, that such a consideration leads to indistinguishability of reasoning about program properties, and not about the indistinguishability of the program behaviour, as an algorithm specification covers only the essential aspects, disregarding the non-important ones. For example, consider a partially ordered alphabet Γ=A,B,C,D\Gamma = \\{\mathtt A, \mathtt B, \mathtt C, \mathtt D\\}:

  • ABD\mathtt A \prec \mathtt B \prec \mathtt D,
  • ACD\mathtt A \prec \mathtt C \prec \mathtt D,

where B\tt B and C\tt C are not comparable. Then at least two different sorting algorithms ss and ss' can emerge. The application of these algorithms to the same sequence CDACB\tt CDACB produce two different outputs:

  • Ms({CDACB})={ABCCD}\mathfrak M_s(\{\tt CDACB\}) = \{\tt ABCCD\}
  • M{s}({CDACB})={ACCBD}\mathfrak M_\{s'\}(\{\tt CDACB\}) = \{\tt ACCBD\}

Such discrepancies do not validate the correctness, meaning that a certain specification does not restrict the algorithm's behaviour more than required. Consequently, composite programs p(,s,)p(\ldots, s, \ldots) and p(,s,)p(\ldots, s', \ldots) can behave differently too, but the reasoning about their properties that relies only on the correctness of ss and ss' will be the same. This approach prevents one of the most dangerous classes of software engineering predicaments — modular testing tarpit [3, page 4].

Modular testing tarpit

Consider a case of a program design Mp:XZ\mathfrak M_p : X \rightharpoonup Z so it implements a formally defined specification xX,{test}(x,Mp(x))={true}\forall x \in X, \mathtt\{test\}(x, \mathfrak M_p(x)) = \mathtt\{true\}. Where {test}:X×ZBool\mathtt\{test\} : X \times Z \to Bool is computable, and exists a natural enumeration (xiX,iN)(x_i \in X, i \in \N) of a domain, that can be used to test the final solution for conformity with the specification. The high-level view of the development cycle of such a program could look like an incremental process, starting with an empty operation p0p_0. Next, the repetitive patching of code, every pip_i to p{i+1}p_\{i+1\} improves its quality until the desired observable behaviour is reached [4]. With the natural enumeration of XX the term quality of pip_i can be considered as a set of passed tests before encountering the first failure. In such terms, the goal of the development is to approach such iteration nn, that the quality of a pnp_n turns out to be impossible to measure, because the failed test cannot be found.

This concept looks good, and frankly, it would not have became an industrial standard without a reason — it is truly simple and approachable. However, this workflow has some hidden drawbacks that can emerge when scrutinized deeper to the tactical level. In case of operating in the complex endeavors the patching pip_i to p{i+1}p_\{i+1\} without quality degradation is a non-trivial process. It is difficult to think about a complex system as a whole. This problem is usually tackled by dissecting the big features into subtasks that are considered independently in some sense. Usually, the changes are narrowed to individual components instead of the entire codebase. The quality of that changes usually relies on the individual thinking about the imposed quality of whole pp.

Consider Mp:=M{p^}M{pˇ}\mathfrak M_p := \mathfrak M_\{\hat p\} \circ \mathfrak M_\{\check p\} as a composition of two parts, a producer M{pˇ}:XY\mathfrak M_\{\check p\} : X \rightharpoonup Y and a consumer M{p^}:YZ\mathfrak M_\{\hat p\} : Y \rightharpoonup Z. An interface between these two parts is described as a countable set YY of possible intermediate data structures passed from the producer to the consumer. Naturally, such modularization works only if it is possible to specify a behavior of the components with testing functions {test/prod}:X×YBool\mathtt\{test/prod\} : X \times Y \to Bool and {test/cons}:Y×ZBool\mathtt\{test/cons\} : Y \times Z \to Bool, in a way that xX,zZ,{test}(x,z)yY:{test/prod}(x,y){test/cons}(y,z)\forall x \in X, z \in Z, \mathtt\{test\}(x, z) \hArr \exists y \in Y : \mathtt\{test/prod\}(x, y) \land \mathtt\{test/cons\}(y, z). Please note that we are talking about a theoretical possibility of an individual test function existences as a formal prerequisite of a proper modularization, and not about an actual formalization, as in real-life cases traditional development workflow does not consider it obligatory.

Now, consider the case where {test/prod}\mathtt\{test/prod\} and {test/cons}\mathtt\{test/cons\} are formalized before the module segregation. Therefore, it is required to implement the smaller parts of the task, so for any M{pˇ}\mathfrak M_\{\check p\} adhering to {test/prod}\mathtt\{test/prod\} and M{p^}\mathfrak M_\{\hat p\} adhering to {test/cons}\mathtt\{test/cons\}, the composition M{p^}M{pˇ}\mathfrak M_\{\hat p\} \circ \mathfrak M_\{\check p\} will be adhering to {test}\mathtt\{test\}, as a consequence of a proper specification design. However, an iterative process of the program development is complicated by the fact that programs are not developed error-free.

The quality of the producer can be defined in the same way as the quality of the whole program is — by testing it against the natural enumeration of the domain XX, determining the quality of the consumer turns out to be a non-trivial task. The intermediate domain YY was defined arbitrarily, so it does not have any given natural enumeration. The quality of a consumer tested against a fixed enumeration (yiY,iN)(y_i \in Y, i \in \N) has a questionable impact on the quality of the whole program, as it requires an application of M{p^}\mathfrak M_\{\hat p\} to the artificial enumeration (M{pˇ}(xi),iN)(\mathfrak M_\{\check p\}(x_i), i \in \N) that necessarily depends on the particular implementation of a producer. This leads to one of the most well-known and universal problems of iterative development of modular programs:

Lehman and Belady have studied the history of successive releases in a large operating system. They find that the total number of modules increases linearly with release number, but that the number of modules affected increases exponentially with release number. All repairs tend to destroy the structure, to increase the entropy and disorder of the system. Less and less effort is spent on fixing original design flaws; more and more is spent on fixing flaws introduced by earlier fixes. As time passes, the system becomes less and less well-ordered. Sooner or later the fixing ceases to gain any ground. Each forward step is matched by a backward one. Although in principle usable forever, the system has worn out as a base for progress. Furthermore, machines change, configurations change, and user requirements change, so the system is not in fact usable forever. A brand-new, from-theground-up redesign is necessary [3, page 122].

Problem, that was described by Frederick P. Brooks Jr. in his monumental "The Mythical Man-Month", on the large part can be attributed to the inadequacy of quality metric implied by testing. A patch, making producer pass a larger sequence of tests against a natural enumeration of the domain, can come at the cost of a changing its behaviour through the earlier tests. Even if such changes are valid within the specification of the producer, a consumer that was well-tested against the producer's output at the previous development iteration may start encountering failures earlier, due to the novel testing environment. That is, in the quality metric of testing, improvement of the producer may lead to degradation of the whole program, which raises a concern about the validity of such metric.

Escape from tarpit

Here, we raise an important question: "If a paradigm of modular testing can sometimes fail, what could be an alternative?". If the assertion that the submodule environmental behaviour validation can be wrong, the submodule behaviour verification against the specification of its environment can be an option. Following the concept stated above, modularization now should begin with the dissection of a program specification into the independent subspecs of its parts and it needs to be taken to the logical endpoint if independently specified submodules are small enough to be verified by formal reasoning, the verification of whole program becomes possible by rules of logical inference.

Consider a specification S=f:XYS = \\{f : X \rightharpoonup Y \mid \ldots\\}, that is used in the implementation in a program pp for the executable machine M\mathfrak M. Instead of starting with empty an program M{p0}S\mathfrak M_\{p_0\} \notin S and incrementally amending it up to M{pn}S\mathfrak M_\{p_n\} \in S (contra cannot be caught by testing), it instead would be rewritten as SS using the rules of logical equivalence until the trivial, to implement correctly iteration is reached. To describe the suggested procedure in detail, introduce some notation first:

  • pN\left|p\right| \in \N denotes the complexity of a program pp with a metric (number of distinct execution states, for example);
  • S/M=min{p,MpS}p\left|S / \mathfrak M\right| = \displaystyle\min_\{p,\mathfrak M_p \in S\}\left|p\right|, denotes the complexity of the simplest SS implementation on M\mathfrak M;
  • q(p1,,pk):P××PPq(p_1, \ldots, p_k) : P \times \ldots \times P \to P denotes a program template — a computable function, that aggregates several arbitrary programs into one;
  • T(S1,,Sk):2{XY}××2{XY}2{XY}T(S_1, \ldots, S_k) : 2^\{X \rightharpoonup Y\} \times \ldots \times 2^\{X \rightharpoonup Y\} \to 2^\{X \rightharpoonup Y\} denotes a specification template — a non-computable function, that aggregates several arbitrary specifications into one;
  • qTq \models T denotes a relation of the template congruity interpreted as follows p1,,pkP,S1,,SkXY,M{p1}S1M{pk}SkM{q(p1,,pk)}T(S1,,Sk)\forall p_1, \ldots, p_k \in P, S_1, \ldots, S_k \subseteq X \rightharpoonup Y, \mathfrak M_\{p_1\} \in S_1 \land \ldots \land \mathfrak M_\{p_k\} \in S_k \rArr \mathfrak M_\{q(p_1, \ldots, p_k)\} \in T(S_1, \ldots, S_k).

Inroduce one important definition:

A machine M\mathfrak M is called reasonable if it has a finite system of congruent templates q1T1,,qtTtq_1 \models T_1, \ldots, q_t \models T_t and triviality bound nNn \in \N that is for every specification SS of the complexity n<S/M<n < \left|S / \mathfrak M\right| < \infty can be refined to Ti(S1,,Sk)S\empty \sub T_i(S_1, \ldots, S_k) \subseteq S for some i1,,ti \in \\{1, \ldots, t\\} with j1,,k,Sj/M<S/M\forall j \in \\{1, \ldots, k\\}, \left|S_j / \mathfrak M\right| < \left|S / \mathfrak M\right|.

Informally, M\mathfrak M is called reasonable if every specification SS, that implementation is feasible and complex enough, can be dissected congruently into a collection of sub-specifications S1,,SkS_1, \ldots, S_k, that is easier to implement than SS. This property allows the following procedure to be used as an incremental development workflow:

  1. Start with a formal specification Sε=f:XYS_\varepsilon = \\{f : X \rightharpoonup Y \mid \ldots\\} of the program's intended behavior and put λ:=ε\lambda := \varepsilon as index variable of recursion.
  2. If Sλ/Mn\left|S_\lambda / \mathfrak M\right| \leq n, then pλ:M{pλ}Sλp_\lambda : \mathfrak M_\{p_\lambda\} \in S_\lambda is trivial enough to find directly.
  3. If Sλ/M>n\left|S_\lambda / \mathfrak M\right| > n, then there is Ti(S{λ.1},,S{λ.k})Sλ,i1,,t\empty \sub T_i(S_\{\lambda.1\}, \ldots, S_\{\lambda.k\}) \subseteq S_\lambda, i \in \\{1, \ldots, t\\}, where j1,,k,S{λ.j}/M<Sλ/M\forall j \in \\{1, \ldots, k\\}, \left|S_\{\lambda.j\} / \mathfrak M\right| < \left|S_\lambda / \mathfrak M\right|. For each jj apply procedure recursively with λ:=λ.j\lambda := \lambda.j producing p{λ.j}:M{p{λ.j}}S{λ.j}p_\{\lambda.j\} : \mathfrak M_\{p_\{\lambda.j\}\} \in S_\{\lambda.j\}. By relation of congruity M{p{λ.1}}S{λ.1}M{p{λ.k}}S{λ.k}M{qi(p{λ.1},,p{λ.k})}Ti(S{λ.1},,S{λ.k})\mathfrak M_\{p_\{\lambda.1\}\} \in S_\{\lambda.1\} \land \ldots \land \mathfrak M_\{p_\{\lambda.k\}\} \in S_\{\lambda.k\} \rArr \mathfrak M_\{q_i(p_\{\lambda.1\}, \ldots, p_\{\lambda.k\})\} \in T_i(S_\{\lambda.1\}, \ldots, S_\{\lambda.k\}), so we can settle on pλ=qi(p{λ.1},,p{λ.k})p_\lambda = q_i(p_\{\lambda.1\}, \ldots, p_\{\lambda.k\}) to have M{pλ}Sλ\mathfrak M_\{p_\lambda\} \in S_\lambda.

As this procedure is structurally recursive, i.e. goes deeper only on incrementally simpler specifications, it eventually stops, producing pεp_\varepsilon as tree-like application of branching templates to collection of small hand-picked programs as leaves. Soundness of such design, i.e. M{pε}Sε\mathfrak M_\{p_\varepsilon\} \in S_\varepsilon, comes as natural conclusion to three sets of premises:

  • proven congruity of templates q1T1,,qtTtq_1 \models T_1, \ldots, q_t \models T_t;
  • proven inclusion of all recursive steps Ti(S{λ.1},,S{λ.k})Sλ\empty \sub T_i(S_\{\lambda.1\}, \ldots, S_\{\lambda.k\}) \subseteq S_\lambda;
  • correct picking of all trivial snippets pλ:M{pλ}Sλp_\lambda : \mathfrak M_\{p_\lambda\} \in S_\lambda.

If all components are in place, the resulting program comes with correctness vindication, which is much stronger than one can be assured with any finite test coverage with a possibility of the counterargument not being found, because the constructive proof exists. The situation of suddenly finding a new input case that reveals the bug, which correction induces cascading degradations of behaviour in other parts of the program, simply cannot arise. Of course, such insurance does not come with no cost — the programmable machine must be reasonable according to the definition above.

At Inferara, we believe in the following informally stated hypothesis:

All machines with sensible operational semantics are reasonable. For any programming paradigm that allows incremental development in a classic sense, it is possible to construct an exhaustive system of congruent templates that relate to intuitive rules of algorithm modularization. Moreover, the triviality bound for such a system is always manageably small.

Conclusion

The development workflow described here can be called verification-driven programming. This new concept is based on the observation that a classical workflow of test-driven development has unavoidable limitations, that prevent programmers from reliably reaching a state of program correctness when the complexity of their task starts requiring non-trivial modularization of algorithm. Here we propose a new method that allows machine-assisted confirmation of program semantics's adherence to the developer's intentions, formulated as a tightly coupled hierarchy of sub-specifications, descending from the most abstract formalization of observable behaviour to the fine-grain description of algorithm structure. We believe such an approach is an invaluable safety rope over the infamous modular testing tarpit that have been consuming countless human resources. It is feasible to correctly implement programs on the first attempt if an engineering group properly designs a reasoning set before an actual implementation.

References