Sudoku Checker and the Minimum Number of Clues Problem
An interesting thing about Sudoku is that there is an unsolved problem:
Currently the fewest known is 17. It is unknown whether there exists a puzzle with 16 clues. In order to search for a 16-clue puzzle, some people do a random search. We wrote a program named checker to search exhaustively through a grid.
This program will search a completed Sudoku grid for a puzzle with n clues. The program checker is more useful for exhaustively searching a grid than for constructing puzzles. Checker will search a puzzle entirely (or partially, if desired) and keep track of any puzzles found. If no puzzle with n clues is found, checker will say so at the end of the search. Thus one may consider checker as answering the question "does this grid have a puzzle with n clues?" In a sense, checker answers yes or no.
Strategy of the program
The strategy we use for searching a given Sudoku grid for a puzzle with n clues is as follows:
We explain each of these points in more detail now.
Checker is a command line tool (both on Unix/Linux and on Windows).
There are five binaries/executables: checker64, checker128, checker192, checker256, and checker320. They are all the same except they use a different number of unavoidable sets for the enumeration of hitting sets. checker64 uses up to 64 additional unavoidable sets (additional in the sense that these are not members of the maximal clique of disjoint unavoidable sets), checker128 up to 128, checker192 up to 192, checker256 up to 256, and checker320 uses up to 320 unavoidable sets.
The reason why we provide all these different versions of checker is that, although using more unavoidable sets will speed things up in theory, in practice there is a performance price to pay for using more unavoidable sets, and sometimes this price is larger than the actual gain of using more unavoidable sets. Therefore, it takes a bit of experimenting to see which version is the best for the grid at hand. See the section Running Time for more details. It turns out that checker192 is the best choice for the majority of grids, although sometimes using more or fewer unavoidables sets can result in a shorter search time. The -random command-line argument (see below) is helpful for deciding which version to use.
The syntax (of all versions) of checker is as follows (run checker with no parameters to get this):
checker <gridfile> <n> [-pseudos] [-solve <percentage>] [-firstclues <m>] [-uquick] [-random] [-unav <file w/ unavoidables>] [-x] [-dg] [-stats]
Checks if there are any n clues that uniquely determine the given Sudoku grid.
<gridfile> — text file containing a completed 9x9
Checker will save the first 500 puzzles and pseudo-puzzles found to a text file that can be viewed while checker is running. (The number 500 can be increased by modifying the constant MAX_PUZZLES near the top of the source file checker.cpp and then recompiling checker.) Also, the -firstclues option can be used to distribute search jobs over several computers.
External Unavoidable SetsThe file format to be used for external unavoidable sets is as follows. All unavoidable sets must be stored in a single text file, one set per line, where each line is given in either of the following two formats:
As a companion to checker we provide unavoid which also takes a completed Sudoku grid and finds some unavoidable sets. Actually, unavoid finds the same unavoidable sets as checker, but does not enumerate the hitting sets etc.
The main purpose of unavoid is to do some preliminary analysis, as it outputs more information about the unavoidable sets found than does checker. Also, unavoid works in batch mode, which means it can process as many grids as the user wants in one go.
This is the syntax of unavoid (run unavoid with no arguments to get
unavoid <gridfile> [-p] [-mcn] [-uquick] [-unav <file w/ unavoidables>] [-x] [-dg]
Finds unavoidable sets and computes the MCN for the given Sudoku grid.
<gridfile> — text file containing a completed 9x9
Since one sometimes needs to solve Sudoku grids with the computer, we provide a modified version of Guenter Stertenbrink's public domain solver. It is a very fast solver, and the version we provide also supports Sudoku-X (diagonal Sudoku) and Sudoku-DG (disjoint groups).
Running solver with no parameters gives the syntax:
solver <file> [-count] [-all] [-x] [-dg]
Solves the Sudoku puzzles in <file> and by default prints the first solution found for each puzzle.
-all — print all solutions that exist
The input file may contain more than one puzzle. Each puzzle is given in a new line.
Downloads (last updated on 14 November 2006)
We provide the full C++ source code as well as precompiled binaries for Windows. To get started, please read the file "readme.txt" which is included with the download. The software is copyrighted, but comes under a very liberal license, see the file "license.txt" included with the download. Also included are ten sample Sudoku grids, including one X-grid (diagonal Sudoku) and one DG-grid (disjoint groups).
If you are interested in a solver program for Windows only, then just download the file solver.exe (44 KB) or solver.zip (20 KB). For help geting started using the solver, please read the Solver section on this page.
Note: this software is provided "as is", with absolutely no warranty.
The Running Time
There are three main factors that affect the running time of checker.
The first is the MCN of the grid. This is beyond the user's control, once the grid is chosen.
The second factor is n, the number of clues wanted. If n<15 or n>20 the program is usually quite fast — if n>20 a puzzle is usually found quickly, and if n<15 the grid is exhaustively searched pretty quickly and no puzzle found. Values of n in the range 15-20 are the most time consuming. It is these values for which checker is intended.
The third factor is the number of unavoidable sets used. This is within the user's control. This matter is quite delicate, and the optimal number of sets to use seems to depend on the grid, or at least on the MCN. We have found that for grids with a small MCN, using about 200 or 300 unavoidable sets makes a big difference over using about 50 sets. For grids with large MCN, the advantage is small, so we might recommend using only 50-100 sets. Experimentation is needed, and there is room for improvement to checker here. There is an overhead to using a large number of unavoidable sets, which is that the enumeration of hitting sets takes longer as mentioned earlier.
Here are some very rough estimates of the running time for each MCN, searching for a 16 clue puzzle, using additional unavoidable sets as discussed, on a Pentium IV 3 GHz machine.
Sometimes most of the computation time is taken up with the enumeration of the hitting sets, and other times most of the computation time is taken up with solving. The former happens when the difference between the n one searches for and the MCN of the grid is small. As we increase n (and hold the grid fixed), the proportion of time spent in the solver grows quickly. The -stats switch of checker is useful for experimenting with this.
To summarise, finding the optimal running time is a balancing act, and there is no general rule at present — each grid has to be considered individually.
Some ways in which checker could be speeded up:
When Not To Use Checker
Although it is certainly possible to use checker for finding puzzles, it is not the best tool for this task — the purpose of checker is really to prove the non-existence of a puzzle with n clues. If you are interested in finding puzzles, take a look at an actual puzzle generator, e.g. the one written by Glenn Fowler, see the Links section below.
We thank Guenter Stertenbrink for sharing his programs, Gordon Royle for sharing his unavoidable sets and his puzzles with 17 clues, Bo Haglund for modifying Guenter Stertenbrink's solver to support diagonal Sudoku, Ocean and Red Ed for helping out with the unavoidable sets, Glenn Fowler for working with us on checker, and Roger Wanamo for pointing out a mistake in unavoid. We also wish to thank the many posters to the sudoku.com forum who contributed.
History and Motivation
It all began when Gordon Royle found that this grid:
has 29 different puzzles in it which have 17 clues. In other words, there are 29 different puzzles, each with 17 clues, whose solution is this grid.
This is a high number, the highest known, so the grid became Strangely Familiar (SF) to readers of the Sudoku forum. And so SF was considered a good candidate for a 16 clue puzzle. If a completed grid has a 16, then it will have at least 65 different 17-clue puzzles.
Random searches for a 16 in SF were not finding anything. So we wrote a program to exhaustively search the SF grid for a 16. We did not find one. So there is no 16-clue puzzle whose solution is the SF grid. We turned that program into checker. (Checker was also later used to find all 17-clue puzzles in SF. It turned out that no further 17-clue puzzles exist beyond the 29 such puzzles that were already known.)
The question of the minimum number of givens in Sudoku is not yet phrased in a mathematical way so as to become a problem in the mathematics of Sudoku. Currently it is a problem in the programming of Sudoku. When we understand it better we can perhaps solve it using mathematics. Most people believe that the answer to the minimum Sudoku question is 17.
ContactPlease send comments or questions to:
Page last updated on 14 November 2006.