Note: This is based on an old (2018) Quora answer.
If I asked you "How much hard drive space would be required for a Database representing every possible position in chess?" what would you answer?
Well, most answers go like this:
"Claude Shannon estimated there are 1043 positions in chess, which can be stored in ~32 bytes each, so it's something like 1044 bytes, and the observable universe has only 1080 atoms so it's something pretty large!"
To that I say phooey Shannon! It takes around 700 bytes if you want the fancy version.
Let's start by defining what the requested artifact is.
"A database representing every possible position in chess"
So, what is such a database's behaviour? I propose:
- It can, if given some sort of index value, return a specific position (which will always be the same)
- It can iterate over all chess positions.
- It should be able to return all positions matching a specific criteria like 'has a white pawn in D5'
My proposed implementation will do the first 2 which I consider actual requirements, the 3rd being just a "nice to have". While it's not impossible to add, even my willingness to do stupid things has a limit.
So, limiting myself to the first two requirements, if I fulfill those then it's done, right?
Because if the deliverable is correct, the rest is implementation details?
Let's implement it!
Step 1: notation
Welcome to Forsyth-Edwards notation (FEN for short). It's a wonderful thing that provides all the necessary information to restart a game.
I will use a hacked subset since all I want is the position (not things like "has white castled?" or "who is moving next?") but it's a trivial exercise to expand this database to do just that.
So, how does this database describe a position? A position is a list of pieces and their positions in a 8x8 board.
In the original FEN white pawn is identified as P and black pawn as p. I consider that an insult to the IETF who has adopted UTF-8 in RFC2777 (and also slightly racist), therefore I will use the proper glyphs: ♙ and ♟.
In fact, I will use ♙♘♗♖♕♔♟♞♝♜♛♚.
As for positions, since my database is small enough that data compression is pointless, let's just use a 64-character fixed-size string where each position is either a piece of a space meaning the square is empty.
So, a position is a string of length 64 where these are the only valid values: "♙♘♗♖♕♔♟♞♝♜♛♚ "
Step 2: implementation
It's obvious that each position is equivalent to a 64-digit number in base 13, which means there are 196 053 476 430 761 073 330 659 760 423 566 015 424 403 280 004 115 787 589 590 963 842 248 960 possible positions.
That 64-digit number is the index key for each position (it's an optimal key, it can't be made any smaller!)
So here's the database core code, offered with no comments since it's barely 12 lines of code
doing nothing weird:
def get_position(index):
def digit_to_char(digit):
return "♙♘♗♖♕♔♟♞♝♜♛♚ "[digit]
def str_base(number, base=13):
(d, m) = divmod(number, base)
if d:
result = str_base(d, base) + digit_to_char(m)
else:
result = digit_to_char(m)
return result
position = str_base(index).rjust(64)
return position
For convenience, here is a pretty printer for your boards:
def print_board(position):
print(" ABCDEFGH ")
for i in range(8):
print("%d%s%d" % (i, position[8 * (i) : 8 * (i + 1)], i))
print(" ABCDEFGH ")
Here you can see it in action (sadly asciinema butchers the alignment, it works properly in real life):
And here is the full source code:
def get_position(index):
def digit_to_char(digit):
return "♙♘♗♖♕♔♟♞♝♜♛♚ "[digit]
def str_base(number, base=13):
(d, m) = divmod(number, base)
if d:
result = str_base(d, base) + digit_to_char(m)
else:
result = digit_to_char(m)
return result
position = str_base(index).rjust(64)
return position
def print_board(position):
print(" ABCDEFGH ")
for i in range(8):
print("%d%s%d" % (i, position[8 * (i) : 8 * (i + 1)], i))
print(" ABCDEFGH ")
if __name__ == "__main__":
import sys
print_board(get_position(int(sys.argv[1])))
Please notice that this database also covers all alternative chess variants where extra pieces are given or removed as handicap.
Step 3
There is no step 3.