E.W.Dijkstra Archive: A Short Introduction to the Art of Programming (EWD 316), Chapter 9

9. The problem of eight queens

The problem is to make a program generating all configurations of eight queens on a chess board of 8 * 8 squares, such that no queen can take any of the others. This means that in the configurations sought no two queens may be on the same row, on the same column or on the same diagonal.

We don't have an operator generating all these configurations: this operator is exactly what we have to make. Now a (very general!) way to attack such a problem is as follows. Call the set of configurations to be generated A; look for a greater set B of configurations with the following properties
1)   set A is a subset of set B
2)   given an element of set B, it is not too difficult to decide whether it belongs to set A as well
3)   we can make an operator generating all elements of set B.

With the aid of the generator (3) for the elements of set B, the elements of set B can then be generated in turn; they will be subjected to the decision criterion (2) which decides whether they have to be skipped or handed over, thus generating elements of set A. Thanks to (1) this algorithm will produce all elements of set A.

Three remarks are in order.
1)   If the whole approach makes sense, set B is not identical to set A and as it must contain set A as a (true) subset, it must be larger. Nevertheless, it is advised to choose it "as small as possible": the more elements it has, the more of them have to be rejected according to criterion (2).
2)   We should look for a decision criterion that is cheap to apply, or at least the discovery that an element of B does not belong to A should (on the average) be cheap.
3)   The assumption is that the generation of the elements of set B is easier than a direct generation of the elements of set A. If, nevertheless, the generation of the elements of set B still presents difficulties, we repeat our pattern of thinking, re-apply the trick and look for a still larger set C of configurations that contains B as a subset etc. (The careful reader will observe that in the course of our solution this will indeed happen.)

Above we have sketched a very general approach, applicable to many, very different problems, Faced with a particular problem, i.e. faced with a specific set A, the problem is of course, what to select for our set B.

In a moment of optimism one could think that this is an easy matter, thinking of the following technique. We list all the mutually independent conditions that our elements of set A must satisfy and omit one of them. Sometimes this works but as a general technique this is too naive; if we want to see its shortcomings, we only need to apply it blindly to the problem of the eight queens. We can characterize our solutions by the conditions:
1) there are 8 queens on the board
2) no two of the queens can take eachother.

Omitting either of these conditions gives for the set B the alternatives
B1: all configurations with N queens on the board such that no two queens can take each other
B2: all configurations of 8 queens on the board
But both sets are so ludicrously huge that they lead to utterly impractical algorithms. We have to be smarter. How?

Well, at this stage of our considerations, being slightly "at a loss", we are not so much concerned with the efficiency of our final program but rather with the efficiency of our own thought processes! So, if we decide to make a list of the properties of solutions, in the hope of finding a useful clue, this is a rather undirected search; we should not invest too much mental energy in such a search, that is: for a start we should restrict ourselves to their obvious properties. Let us go ahead.
a)   No row may contain more than one queen; 8 queens are to be placed and the chess board has exactly 8 rows. As a result we can conclude that each row will contain precisely one queen.
b)   Similarly we conclude that each column will contain precisely one queen.
c)   There are fifteen "upward" diagonals, each of them containing at most one queen, i.e. 8 upward diagonals contain precisely one queen and 7 upward diagonals are empty.
d)   Similarly we conclude that 8 downward diagonals are occupied by one queen and 7 are empty.
e)   Given any non-empty configuration of queens such that no two of them can take each other, removal of any of these queens will result in a configuration sharing that property.

Now the last one is a very important property: in our earlier terminology it tells us something about any non-empty configuration from set B1. Conversely it tells us that each non-empty configuration from set B1 can be generated (in N different ways!) by extending a configuration from B1 with N-1 queens by another queen. We have rejected B1 because it was too large, but maybe we can find a suitable subset of it, such that each non-empty configuration is a one-queen extension of only one other configuration from the subset. This "extension property" suggests that we are willing to consider configurations with less than 8 queens and that we would like to form a new configuration by adding a queen to an existing configuration —a relatively simple operation presumably. Well, this draws our attention immediately to the generation of the elements of the (still mysterious) set B. For instance: in what order? And this again raises a question to which, as yet, we have not paid the slightest attention: in what order are we to generate the solutions, i.e. the elements of set A? Can we make a reasonable suggestion in the hope of deriving a clue from it?

Prior to that we should ask ourselves: how do we characterize solutions once we have them? To characterize a solution we must give the positions of 8 queens. The queens themselves are unordered, but the rows and the columns are not: we may assume them to be numbered from 0 through 7. Thanks to property a), which tells us that each row contains precisely one queen, we can order the queens according to the number of the row they occupy. Then each configuration of 8 queens can be given by the value of the integer array x[0:7], where
x[i] = the number of the column occupied by the queen in the i-th row.

Each solution is then "an 8-digit word" (x[0] ... x[7]) and the only sensible order in which to generate these words that I can think of is the alphabetical order.

Note. As a consequence we open the way to algorithms in which rows and columns are treated differently. At first sight this is surprising, because the original problem is completely symmetrical in rows and columns. We should be glad: to consider asymmetric algorithms is exactly what the above considerations have taught us!

Returning to the alphabetical order: now we are approaching familiar ground. If the elements of set A are to be generated in alphabetical order and they have to be generated by selecting them from a larger set B, then the standard technique is generating the elements of set B in alphabetical order as well, and to produce the elements of the subset in the order in which they occur in set B.

First we have to generate all solutions with x[0] = 0, then all with x[0] = 1 etc.; of the solutions with x[0] = 0 those with x[1] = 0 (if any) have to be generated first, then those with x[1] = 1 (if any), then those with x[1] = 2 (if any) etc. In other words: the queen of row 0 is placed in column 0 —say: the square in the top left corner— and remains there until all elements of A (and B) with queen 0 in that position have been generated, and only then is she moved one square to the right to the next column. For each position of queen 0, queen 1 will walk from left to right in row 1 —skipping the squares that are covered by queen 0—; for each combined position of the first two queens, queen 2 walks along row 2 from left to right, skipping all squares covered by the preceding queens, etc.

But now we have found set B! It is indeed a subset of B1: set B consists of

all configurations with one queen in each of the first N rows, such that no two queens can take each other.

Having established our choice for the set B, we find ourselves immediately faced with the task of generating its elements in alphabetical order. We could try to do this via an operator "GENERATE NEXT ELEMENT OF B" what would lead to a program of the following structure:

      INITIALIZE EMPTY BOARD;
      repeat GENERATE NEXT ELEMENT OF B;
             if N = 8 do PRINT CONFIGURATION 
      until B EXHAUSTED

but this is not attractive for the following two reasons.

Firstly, we don't have a ready-made criterion to recognize the last element of B when we meet it, and in all probability we have to generalize the operator "GENERATE NEXT ELEMENT OF B" in such a way that it will produce the indication "B EXHAUSTED" when it is applied to the last "true" element of B. Secondly, it is not too obvious how to make the operator "GENERATE NEXT ELEMENT OF B": the number of queens on the board may remain constant, it may increase and it may decrease.

So that is not too attractive. What can we do about it? As long as we regard the sequence of configurations from set B as a single sequence, not subdivided into a succession of subsequences, the corresponding program structure will be the single loop as in the program just sketched. If we are looking for an alternative program structure, we must therefore ask ourselves: "How can we group the sequences of configurations from set B into a succession of subsequences?".

Realizing that the sequence of configurations from set B has to be generated in alphabetical order, and thinking of the main subdivision in a dictionary —viz. by first letter—, the first grouping is obvious: by position of queen 0.

Generating all elements of set B —for the moment we forget about the printing of the elements that belong to the subset A as well— then presents itself in the first instance as

     h:= 0;
     repeat SET QUEEN ON SQUARE H;
            GENERATE ALL CONFIGURATIONS WITH QUEEN 0 FIXED;
            REMOVE QUEEN;
            h:= h + 1 
     until h = 8  ,

where the operations SET QUEEN and REMOVE QUEEN pertain to row zero, i.e. the first free row or the last occupied row respectively.

But now the question repeats itself: now do we group all configurations with queen 0 fixed? We have already given the answer: in order of increasing column position of queen 1, i.e.

     h1:= 0;
     repeat if SQUARE H1 FREE do
            begin SET QUEEN ON SQUARE H1;
                  GENERATE ALL CONFIGURATIONS WITH FIRST 2 QUEENS FIXED; 
                  REMOVE QUEEN 
            end;
            h1:= h1 + 1 
     until h1 = 8

where, again, SQUARE FREE and SET QUEEN pertain to the first free row and REMOVE QUEEN pertains to the last occupied row.

For "GENERATE ALL CONFIGURATIONS WITH FIRST 2 QUEENS FIXED" we could write a similar piece of program and so on: inserting then inside each other would result in a correct program with some eight nested loops, but they would all be very, very similar. To do so has two disadvantages:
1) it takes a cumbersome amount of writing
2) it gives a program solving the problem for a chess board of 8 * 8 squares, but to solve the same puzzle for a board of, say, 10 * 10 squares would require a new (still longer) program. We would like to avoid this by exploiting the similarity of the loops.

Then we have to answer two questions:
1) can we make the loops exactly identical?
2) can we then profit from their similarity?

The two exceptional cycles are the outermost one and the innermost one. The outermost one is different because it does not test whether the next square is free. There is, however, no objection to inserting this test: as it is only applied when the board is empty it is guaranteed to give the value true, and we can give the outermost cycle the same form by inserting the conditional clause

     if SQUARE H FREE do   .

The innermost cycle is exceptional in the sense that as soon as 8 queens have been placed on the board, there is no point in generating all configurations with those queens fixed, because we have a full board. Instead the configuration has to be printed, because we have found an element of set B that is also an element of set A. We can map the innermost cycle and the embracing seven ones upon each other by replacing the line "GENERATE" by

     if BOARD FULL then PRINT CONFIGURATION
                   else GENERATE ALL CONFIGURATIONS EXTENDING THE CURRENT ONE.

By now the only difference between the eight cycles is that each cycle has to have "its private h". By the time that we have reached this stage, we can give an affirmative answer to the second question. The sequencing through the eight nested loops can be provoked with the aid of a recursive procedure, "generate" say, which describes the cycle once. Using it, the program itself collapses into

     INITIALIZE EMPTY BOARD; 
     generate

while "generate" is defined recursively as follows:

procedure generate; 
begin integer h; 
      h:= 0; 
      repeat if SQUARE H FREE do
           begin SET QUEEN ON SQUARE H;
                 if BOARD FULL then PRINT CONFIGURATION
                               else generate; 
                 REMOVE QUEEN 
           end;
           h:= h + 1 
      until h = 8 
end.

Each activation of "generate" will introduce its private local variable h, thus catering for h, h1, h2, ... that we would need when writing 8 nested loops inside each other. SQUARE H FREE and SET QUEEN ON SQUARE H again refer to the first free row, the operation REMOVE QUEEN to the last occupied row.

Our program —although correct to this level of detail— is not yet complete, i.e. it has not been refined up to the standard degree of detail that is required by our programming language. In our next refinement we should decide upon the conventions according to which we represent the configurations on the board. We have already decided more or less that we shall use the

     integer array x[0:7]

giving in order the column number occupied by the queens. We need a separate convention to represent the number of queens on the board. Let us introduce integer n, such that

n = the number of queens on the board

x[i] = for O ≤ i < n; the number of the column occupied by the queen
in the i-th row.

The array x and the scalar n are together sufficient to fix any configuration of the set B, and those will be the only ones on the chess board. As a result we have no logical need for more variables; yet we shall introduce a few more because from a practical point of view we can make good use of them. The problem is that with only the above material, the analysis of whether a given square in the next free row is uncovered is rather painful and time-consuming. Here we can look for a standard technique, called "trading storage space versus computation time". The pattern of this technique is as follows.

In its most simple form we are faced with a computation that regularly needs the value of FUN(arg) where "FUN" is a given, computable function defined on the current value of one or more stored variables, collectively called "arg". In version 1 of a program, only the value of arg is stored and the value of FUN(arg) is computed when ever needed. In version 2, an additional variable, "fun" say, is introduced with the sole purpose of recording the value of "FUN(arg)" corresponding to the current value of "arg".

Where version 1 has

     arg:= ... (i.e. assignment to arg)

version 2 has (effectively)

arg:= ...; fun:= FUN(arg) ,

thereby maintaining the validity of the relation

fun = FUN(arg) .

As a result of the validity of this relation, wherever version 1 calls for the evaluation of FUN(arg), version 2 will call for the current value of the variable "fun".

The introduction of this redundant additional tabulated material is one of the programmer's most powerful ways of improving the efficiency of a program. Of course we need our ingenuity for its invention!

Quite often the situation is not as simple as that, and we come now to the second reason for introducing such a variable "fun". Often it is very unattractive to compute FUN(arg) from scratch for arbitrary values of arg while it is much easier to compute how the value of FUN(arg) changes when the value of arg is changed. In that case the adjustment of the value of fun is more intimately linked with the nature of the functional dependence and the history of the variable arg than is suggested by

arg:= ...; fun:= FUN(arg)

After this interlude on program optimization via trading storage space versus computation time, we return to our eight queens. The role of "arg" is played by the configuration on the board, but this value is not changed wildly, oh no, the only thing we do to it is adding or removing a queen. And we are looking for additional tables that will assist us in the decision as to whether a square is free, tables such that they can be kept up to date easily when a queen is added to or removed from the configuration.

How? Well, we might think about a boolean array of 8 * 8, indicating for each square whether it is free or not. If we do this for the full board, adding a queen implies dealing with up to 29 squares; removing a queen, however, is then a painful process because it does not follow that all squares no longer covered by her are indeed free: they might be covered by other queens. There is a standard remedy for this, viz. associating with each square not a boolean variable but an integer counter, counting the number of queens covering the square. Adding a queen means increasing up to 29 counters by 1, removing a queen means decreasing up to 29 counters by 1 and a square is free when its counter is zero. We could do it that way, but the question is whether this is not overdoing it: 29 adjustments is quite a lot.

Each square, in the freedom of which we are interested, covers a row (which is free by definition, so we need not bother about that) one of 8 columns (which must still be empty), one of 15 upward diagonals (which must still be empty) and one of the 15 downward diagonals (which must still be empty). This suggests that we should keep track of
1) the columns that are free
2) the upward diagonals that are free
3) the downward diagonals that are free.

As each column or diagonal is covered only once we don't need a counter for each, but a boolean is sufficient. For the columns we introduce a

          boolean array col[0:7]

where "col[i]" means that the i-th column is still free.

How do we identify the diagonals? Well, along an upward diagonal the difference between row number and column number is constant; along a downward diagonal their sum. As a result difference and sum respectively are the easiest index by which to distinguish the diagonals, and we introduce therefore

          boolean array up[-7:+7], down[0:14]

to keep track of which diagonals are free.

The question whether square[n,h] is free becomes

          col[h] and up[n-h] and down[n+h]  ,

setting and removing a queen both imply adjustment of three booleans, one in each array.

Without the tabulated material, REMOVE QUEEN would only consist of "n:= n - 1": now we would like to know her column number as well, i.e. we replace it by REMOVE QUEEN FROM SQUARE H. In the final program, the variable "k" is introduced for general counting purposes; statements and expressions are labelled for explicative purposes.

This completes the treatment of our problem; the program, incidentally, generates 92 configurations.

By way of conclusion I would like to make up the bill: the final solution is not very important (at least not more important than the problem of the eight queens). The importance of this section is to be found in the methods on which our final program relies, and the way in which we have found them.

1) The final algorithm embodies a very general technique, so general that it has a well-established name: it is called "backtracking". The configuration of set B can be thought of as placed at the nodes of a hierarchical tree, each node containing configuration C supporting the subtree with all the nodes with configurations C as a true sub-configuration. At the root of the tree we have the empty configuration (from which 8 different branches emanate). At each next level we find configurations with one queen more and at the top nodes (the leaves) we find the 92 solutions. The backtracking algorithm generates and scans the nodes of this tree in a systematic manner. I recommend the reader to become thoroughly familiar with the idea of backtracking, because it can be applied when faced with a great number of at first sight very different problems. (It is only when you recognize that they all ask for a solution by means of backtracking that the problems become boringly similar to eachother.)

2) If the only thing the student gains from this section is his becoming familiar with backtracking, he has learned something, but it was my intention to teach him more: we showed all the considerations which together can lead to the discovery of our method, this time backtracking. But it is my firm conviction that, when faced with a different problem to be solved by a different method, the latter may be discovered by a very similar method.

3) The final program contained a recursive procedure. But backtracking is by no means the only algorithmic pattern that is conveniently coded with the aid of recursion. The main point was the collection of considerations leading to the discovery that in this case recursion was a appropriate tool.

4) The major part of our analysis has been carried out before we had decided how (and how redundantly) a configuration would be represented inside the machine. It is true that such considerations only bear fruit, when, finally, a convenient representation for configurations can be found. Yet it is essential not to bother about the representation before it becomes crucial. There is a tendency among programmers to decide the (detailed) representation conventions first and then to think about the algorithm in terms of this specific representation, but that is putting the cart before the horse. It implies that any later revision of the representation convention implies that all thinking about the algorithm has to be redone; it fails to give due recognition to the fact that the only point in manipulating (such groups of) variables is that they stand for something else, configurations in our case.

5) The trading of storage space versus computation time is more than a trick that is useful in this particular program. It is exemplar for many of the choices a producing programmer has to make; he will work more consciously and more reliably when he recognizes them as such.

Exercise. Write two programs generating for N > 0 all N! permutations of the numbers 1 through N, one with and one without recursion, and establish the correctness of both programs.

Exercise. For 0 < N < M generate all integer solutions of the equations in c[1] through c[N] such that

1)   c[1] ≥ 0
2)   c[i] ≥ c[i-1] for 1 < i ≤ N
3)   c[1]+...+ c[N] = M   .

Again, write two programs, one without and one with recursion and establish their correctness.

begin
  integer n, k; integer array x[0:7]; 
  boolean array col[0:7], up[-7:+7], down[0:14]; 
    procedure generate; 
    begin integer h; h:= 0;
      repeat if SQUARE H FREE: (col[h] and up[n-h] and down[n+h]) do
        begin SET QUEEN ON SQUARE H;
              x[n]:= h;  col[h]:= false; up[n-h]:= false; 
              down[n+h]:= false; n:= n + 1; 
          if BOARD FULL: (n = 8) then 
          begin PRINT CONFIGURATION:
            k:= 0; repeat print(x[k]); k:= k + 1 until k = 8; newline
          end
                                 else generate; 
          REMOVE QUEEN FROM SQUARE H:
              n:= n - 1; down[n+h]:= true; up[n-h]:= true; col[h];= true
        end;
        h:= h + 1
      until h = 8
    end;
    INITIALIZE EMPTY BOARD: 
      n:= 0;
      k:= 0; repeat col[k]:= true; k:= k + 1 until k = 8;
      k:= 0; repeat up[k-7]:= true; down[k]:= true; k:= k + 1 until k = 15; 
    generate 
end

Next chapter: 10. A rearranging routine