Sudoku Checker and the Minimum Number of Clues Problem

Summary:  We provide a computer program named checker that searches a Sudoku grid for a puzzle with a given number of clues.  The user provides a completed Sudoku grid, and a number n, and the program checks if there are any n clues that uniquely determine the Sudoku grid.  We also include a sudoku solver program, with support for Sudoku-X (diagonal Sudoku) and Sudoku-DG (disjoint groups).  Full C++ source code and Windows executables are available for download below.

Note: This page is outdated.  The conjecture that no 16-clue puzzle exists is now proven.

Introduction

An interesting thing about Sudoku is that there is an unsolved problem:

Find the smallest number of clues (given numbers) that a Sudoku puzzle can have.

Currently the fewest known is 17.  It is unknown whether there exists a puzzle with 16 clues.  In order to search for a 16-clue puzzle, some people do a random search.  We wrote a program named checker to search exhaustively through a grid. 

This program will search a completed Sudoku grid for a puzzle with n clues.  The program checker is more useful for exhaustively searching a grid than for constructing puzzles.  Checker will search a puzzle entirely (or partially, if desired) and keep track of any puzzles found.  If no puzzle with n clues is found, checker will say so at the end of the search.  Thus one may consider checker as answering the question "does this grid have a puzzle with n clues?"  In a sense, checker answers yes or no.

Strategy of the program

The strategy we use for searching a given Sudoku grid for a puzzle with n clues is as follows: 

  1. Find some unavoidable sets;
  2. Enumerate all hitting sets of size n, i.e., enumerate all sets of n clues that intersect all the unavoidable sets found in part 1;
  3. Check if any of these hitting sets uniquely determine the grid.

We explain each of these points in more detail now.

  1. An unavoidable set in a completed Sudoku grid is a set of cells (and their digits) such that some permutation of the cells results in another valid completed grid.  An equivalent definition is this: a subset of a Sudoku grid is unavoidable if and only if deleting all the clues in the subset from the Sudoku grid results in a Sudoku puzzle that is not uniquely solvable, i.e. a puzzle that has at least two different completions.  An unavoidable set is said to be minimal if no proper subset is itself unavoidable.  Usually when we say unavoidable set we mean minimal unavoidable set.

    The unavoidable sets are found in two steps.  First, we find all unavoidable sets of size less than or equal to 12.  Checker contains a "blueprint" of each type of unavoidable set of size not exceeding 12 and finds sets matching each type in the given grid.  Typically this gives at least 200 unavoidable sets.  (We have not actually proved that we find all unavoidable sets of size up to 12, but we believe this to be true.)

    Second, we incorporate the logic of two programs written by Guenter Stertenbrink, nppcsu and unav36, which find unavoidable sets of a special type.  We use only those sets of sizes 13 to 14 found with this program, as bigger unavoidable sets usually do not help when enumerating hitting sets in step 2.

  2. To compute the hitting sets, we first find a maximal collection of pairwise disjoint unavoidable sets.  (This uses a max clique algorithm from graph theory.)  Suppose this maximal collection is of size m.  We call m the MCN (maximal clique number) of the grid.  We draw all possible m-tuples from these m disjoint sets.  For each such m-tuple, we see if all our unavoidable sets are hit.  If not, take the first unavoidable set not hit, and draw the next clue from this set (and do this in all possible ways).  Repeat until we have hit all unavoidable sets.  If at some stage we find we have drawn n clues but still do not hit all unavoidable sets, we just move on to the next tuple.

    It is pretty clear that the larger the MCN is, and the more unavoidable sets we have, the better, although only small unavoidable sets are actually helpful in practice, which is why we collect unavoidable sets of size up to 14 only.  However, if so desired, one may feed unavoidable sets from an external source to checker, and these sets are allowed to be of any size.

  3. Each hitting set of size n is then run through a solver, to see if it uniquely determines the given Sudoku grid.  The solver we deploy was written by Guenter Stertenbrink.

Running checker

Checker is a command line tool (both on Unix/Linux and on Windows).

There are five binaries/executables: checker64, checker128, checker192, checker256, and checker320.  They are all the same except they use a different number of unavoidable sets for the enumeration of hitting sets.  checker64 uses up to 64 additional unavoidable sets (additional in the sense that these are not members of the maximal clique of disjoint unavoidable sets), checker128 up to 128, checker192 up to 192, checker256 up to 256, and checker320 uses up to 320 unavoidable sets.

The reason why we provide all these different versions of checker is that, although using more unavoidable sets will speed things up in theory, in practice there is a performance price to pay for using more unavoidable sets, and sometimes this price is larger than the actual gain of using more unavoidable sets.  Therefore, it takes a bit of experimenting to see which version is the best for the grid at hand.  See the section Running Time for more details.  It turns out that checker192 is the best choice for the majority of grids, although sometimes using more or fewer unavoidables sets can result in a shorter search time.  The -random command-line argument (see below) is helpful for deciding which version to use. 

The syntax (of all versions) of checker is as follows (run checker with no parameters to get this):


  checker <gridfile> <n> [-pseudos] [-solve <percentage>]
          [-firstclues <m>] [-uquick] [-random]
          [-unav <file w/ unavoidables>] [-x] [-dg]
          [-stats]

Checks if there are any n clues that uniquely determine the given Sudoku grid.

<gridfile> — text file containing a completed 9x9 Sudoku grid
<n> — the n to search for, must be in the range from 10-30
-pseudos — also search for pseudo-(n-1)-puzzles (with two completions)
-solve <percentage> — percentage of puzzles to be solved, where the default value is 100 (exhaustive search)
-uquick — find only a small collection of unavoidable sets (which is quicker) 
-random — do a random search (useful for obtaining a reliable estimate of the total search time relatively quickly)
-unav <file w/ unavoidables> — file containing further unavoidable sets of the grid (see the paragraph External Unavoidable Sets below for details)
-firstclues <m> — allows to select the first m clues thereby performing a partial search only (useful to split the search into several jobs)
-x — use Sudoku-X rules (Sudoku-X = diagonal Sudoku)
-dg — use Sudoku-DG rules (DG = disjoint groups)
-stats — print some statistics (currently only one item: time spent in solver)


Checker will save the first 500 puzzles and pseudo-puzzles found to a text file that can be viewed while checker is running.  (The number 500 can be increased by modifying the constant MAX_PUZZLES near the top of the source file checker.cpp and then recompiling checker.)  Also, the -firstclues option can be used to distribute search jobs over several computers.

External Unavoidable Sets

The file format to be used for external unavoidable sets is as follows.  All unavoidable sets must be stored in a single text file, one set per line, where each line is given in either of the following two formats: 
  1. {x_1,x_2,...,x_n}, where the x_k are the coordinates of a cell given in the form ij, where i is the row and j is the column, both running from 1-9 (rows from top to bottom, columns from left to right).
    Example.  {12,16,32,36} is an unavoidable set of size 4 consisting of two cells in the first row and two in the third row, namely those in columns 2 and 6.
  2. k a_1 a_2 ... a_k, where k is the number of elements in the unavoidable set, and the a_j's are the actual elements.  However, here the elements are not specified as in the previous case, but rather by numbering the cells in a 9x9 Sudoku grid from 0-80, top to bottom, left to right.
    Example.  4 1 5 19 23 is the same unavoidable set as above. 

Unavoid

As a companion to checker we provide unavoid which also takes a completed Sudoku grid and finds some unavoidable sets.  Actually, unavoid finds the same unavoidable sets as checker, but does not enumerate the hitting sets etc. 

The main purpose of unavoid is to do some preliminary analysis, as it outputs more information about the unavoidable sets found than does checker.  Also, unavoid works in batch mode, which means it can process as many grids as the user wants in one go. 

This is the syntax of unavoid (run unavoid with no arguments to get this):


  unavoid <gridfile> [-p] [-mcn] [-uquick]
          [-unav <file w/ unavoidables>] [-x] [-dg]

Finds unavoidable sets and computes the MCN for the given Sudoku grid.

<gridfile> — text file containing a completed 9x9 Sudoku grid
-p — print all unavoidable sets
-mcn — print MCN only (MCN = max clique number, the maximum number of disjoint unavoidable sets), cannot be used in conjunction with -p
-uquick — find only a small collection of unavoidable sets (which is quicker) 
-unav <file w/ unavoidables> — file containing further unavoidable sets of the grid (see the paragraph External Unavoidable Sets above for details)
-x — use Sudoku-X rules (Sudoku-X = diagonal Sudoku) 
-dg — use Sudoku-DG rules (DG = disjoint groups) 


Solver

Since one sometimes needs to solve Sudoku grids with the computer, we provide a modified version of Guenter Stertenbrink's public domain solver.  It is a very fast solver, and the version we provide also supports Sudoku-X (diagonal Sudoku) and Sudoku-DG (disjoint groups).

Running solver with no parameters gives the syntax:


  solver <file> [-count] [-all] [-x] [-dg]

Solves the Sudoku puzzles in <file> and by default prints the first solution found for each puzzle.

-all — print all solutions that exist
-count — do not print any solutions, but only count them
-x — use Sudoku-X rules (Sudoku-X = diagonal Sudoku)
-dg — use Sudoku-DG (disjoint groups) rules


The input file may contain more than one puzzle.  Each puzzle is given in a new line.

Downloads (last updated on 14 November 2006)

We provide the full C++ source code as well as precompiled binaries for Windows.  To get started, please read the file "readme.txt" which is included with the download.  The software is copyrighted, but comes under a very liberal license, see the file "license.txt" included with the download.  Also included are ten sample Sudoku grids, including one X-grid (diagonal Sudoku) and one DG-grid (disjoint groups).

Unix/Linux version: checker.tar.gz (65 KB)
Windows version: checker.zip (626 KB)

If you are interested in a solver program for Windows only, then just download the file solver.exe (44 KB) or solver.zip (20 KB).  For help geting started using the solver, please read the Solver section on this page.

Note: this software is provided "as is", with absolutely no warranty.

The Running Time

There are three main factors that affect the running time of checker.

The first is the MCN of the grid.  This is beyond the user's control, once the grid is chosen.

The second factor is n, the number of clues wanted.  If n<15 or n>20 the program is usually quite fast — if n>20 a puzzle is usually found quickly, and if n<15 the grid is exhaustively searched pretty quickly and no puzzle found.  Values of n in the range 15-20 are the most time consuming.  It is these values for which checker is intended.

The third factor is the number of unavoidable sets used.  This is within the user's control.  This matter is quite delicate, and the optimal number of sets to use seems to depend on the grid, or at least on the MCN.  We have found that for grids with a small MCN, using about 200 or 300 unavoidable sets makes a big difference over using about 50 sets.  For grids with large MCN, the advantage is small, so we might recommend using only 50-100 sets.  Experimentation is needed, and there is room for improvement to checker here.  There is an overhead to using a large number of unavoidable sets, which is that the enumeration of hitting sets takes longer as mentioned earlier. 

Here are some very rough estimates of the running time for each MCN, searching for a 16 clue puzzle, using additional unavoidable sets as discussed, on a Pentium IV 3 GHz machine.

MCN Running Time (rough estimate)
16 20 minutes
15 40 minutes
14 1 hour
13 2 hours
12 4 hours
11 6 hours
10 8 hours
9 10-12 hours
8 1 day

Sometimes most of the computation time is taken up with the enumeration of the hitting sets, and other times most of the computation time is taken up with solving.  The former happens when the difference between the n one searches for and the MCN of the grid is small.  As we increase n (and hold the grid fixed), the proportion of time spent in the solver grows quickly.  The -stats switch of checker is useful for experimenting with this.

To summarise, finding the optimal running time is a balancing act, and there is no general rule at present — each grid has to be considered individually. 

Future Improvements

Some ways in which checker could be speeded up:

  • Write a faster solver (although the one we currently use by Guenter Stertenbrink appears to be very fast already compared with other solvers available).
  • Add support for unavoidable sets requiring two digits (these are sometimes called 2d-unavoidables).

When Not To Use Checker

Although it is certainly possible to use checker for finding puzzles, it is not the best tool for this task — the purpose of checker is really to prove the non-existence of a puzzle with n clues.  If you are interested in finding puzzles, take a look at an actual puzzle generator, e.g. the one written by Glenn Fowler, see the Links section below.

Acknowledgments

We thank Guenter Stertenbrink for sharing his programs, Gordon Royle for sharing his unavoidable sets and his puzzles with 17 clues, Bo Haglund for modifying Guenter Stertenbrink's solver to support diagonal Sudoku, Ocean and Red Ed for helping out with the unavoidable sets, Glenn Fowler for working with us on checker, and Roger Wanamo for pointing out a mistake in unavoid.  We also wish to thank the many posters to the sudoku.com forum who contributed.

Links

History and Motivation

It all began when Gordon Royle found that this grid:

 6   3   9 
 2   8   4 
 5   1   7 
 2   4   1 
 7   6   5 
 9   8   3 
 7   8   5 
 1   9   3 
 6   2   4 
 1   2   3 
 7   9   6 
 4   5   8 
 8   5   7 
 4   3   2 
 6   1   9 
 9   4   6 
 8   5   1 
 2   3   7 
 3   4   2 
 8   6   1 
 9   7   5 
 1   7   8 
 5   9   4 
 3   2   6 
 5   6   9 
 3   7   2 
 4   1   8 

has 29 different puzzles in it which have 17 clues.  In other words, there are 29 different puzzles, each with 17 clues, whose solution is this grid. 

This is a high number, the highest known, so the grid became Strangely Familiar (SF) to readers of the Sudoku forum.  And so SF was considered a good candidate for a 16 clue puzzle.  If a completed grid has a 16, then it will have at least 65 different 17-clue puzzles.

Random searches for a 16 in SF were not finding anything.  So we wrote a program to exhaustively search the SF grid for a 16.  We did not find one.  So there is no 16-clue puzzle whose solution is the SF grid.  We turned that program into checker. (Checker was also later used to find all 17-clue puzzles in SF.  It turned out that no further 17-clue puzzles exist beyond the 29 such puzzles that were already known.)

The question of the minimum number of givens in Sudoku is not yet phrased in a mathematical way so as to become a problem in the mathematics of Sudoku.  Currently it is a problem in the programming of Sudoku.  When we understand it better we can perhaps solve it using mathematics. Most people believe that the answer to the minimum Sudoku question is 17.

Contact

Please send comments or questions to:


Page last updated on 14 November 2006.