Book Learning - a methodology to automatically tune an opening book

Abstract

This paper describes some experiments in "book learning" in the computer chess program Crafty, developed by the author. Crafty plays on the various Internet chess servers, and as a result, typically plays over 20,000 games per year. A favorite tactic of human chess players is to find a weak book line that a program will follow repeatedly, and then play that opening over and over, until they find a way to win the game. They then repeat this game many times. In years past, the author would simply edit the book by hand when such "cooks" were found, to prevent the program from playing them repeatedly. This paper describes an approach that automates this "book tuning" by letting Crafty determine which openings are good and which are bad, and adjusting its book selections accordingly.

1. Introduction

Computer chess programs have used an "opening book" from the early days of computer chess programs such as Mac Hack through the 1970's and 1980's (Chess 4.x, Cray Blitz, Belle, HiTech) and into the 1990's. The opening book is an important part of a computer chess program, because it addresses several issues that are difficult to handle with other approaches.

This "opening book" is made by taking thousands (or hundreds of thousands) of GM games and creating a database of opening moves. Then, when your opponent plays a move you have in your database, you have a move to play in response, instantly. These moves are generally considered to be "good" because they were played by grandmaster players, and if they are played often enough, with good results, they are considered to be "opening theory" and are played whenever possible.

Without going into detail, the purpose of an opening book is to provide variability in games so that the program avoids playing a losing line repeatedly and also so that it avoids playing the same line repeatedly even if it wins. To accomplish this, it is common to build an opening book from a large collection of PGN games, but doing so runs the risk of a program following a bad line and losing, and then continuing to repeat this losing line.

It would seem, then, that using a large collection of games is required to avoid repeating the same opening too often, but that using a large collection of games can be bad because the games may well include serious blunders. This turns into a time-intensive task that takes time away from the engine development.

The problem resolves into maintaining a very large database of opening moves that have been played by strong players in tournament games. But since the database is so large, it is desirable that a chess-playing program take this database, try the various moves in real games, and then "learn" which are good and which are bad.

2. The Learned Value

The first issue to be addressed is choosing a method that decides whether a book line is bad or good. There are two obvious approaches.

2.1 Result-driven learning

Approach number one is called "result-driven" and simply looks at how the game ends. If the program loses, that opening line is called a "loser" and action can be taken to avoid playing it again (how this is done is described later.) I chose to go beyond this approach because it is quite common for a book line to only extend 10-15 moves into the game, while most games last well beyond 50 moves. The question that must be answered is "was this game lost because of a bad opening, or was it lost because of poor play by the program?"

If the result-driven approach is taken, program weaknesses will cause incorrect opening selection. For example, imagine playing a super-GM such as Kasparov, who would be expected to win most games against most any program under tournament time controls. Such a method for deciding whether an opening is playable or not would eventually convince the program that no book moves are playable, based on the outcome against Kasparov. While this might be quite important in a computer vs computer test scenario that plays many games, it can lead to problems.

Result-driven learning has one important short-term impact on the way Crafty uses the book. If someone tries an opening variation that has not been tried before, and if the program loses this game, it backtracks through the opening variation to find the last point where there were alternative moves. The move actually played is given a large negative learned value so that it will not be played again.

The advantage to this is that there are openings which are going to produce lots of trouble because the program simply does not understand what is required to be successful (yet). Using this algorithm, even if the score looked good after playing 10 moves beyond the end of the book variation, Crafty will still vary and not play that game again. This doesn't happen often enough to be bothersome, but it does prevent some cases where the normal learning algorithm decides that the book variation is playable, but the game is lost anyway. And it might be lost again and again if it continues to play that book move.

2.2 Search-driven learning

Approach number two can be called "search-driven" and uses the early search results, obtained after following a book line, to determine whether the book line resulted in a game with reasonable chances or not. Crafty might still lose the game, but if it is "satisfied" with the position after leaving the opening book, then the loss might not be the fault of the book, rather it might be the fault of playing bad moves later in the game.

2.3 The learned value

When I started on this project, I originally planned to use the first real search after leaving the book, and taking that value to determine whether the resulting position was suitable to play again or if it should be avoided. Unfortunately, I discovered that this did not produce accurate results, for several reasons.

The first problem I encountered occurred on gambit openings where Crafty was offered a gambit pawn and accepted it. The opponent then might play some move not in the book at all, and Crafty would drop out and do the first search, and it could generally find that it could hold on to the pawn and remain material ahead. The learned result might show it 1/2 pawn ahead for example, this produced by being a pawn ahead, but with the opponent having a lead in development. If I used this value, Crafty would assume this gambit should be accepted whenever offered, even though a few moves later it had found that accepting this gambit led to a bad or lost position.

The second problem is the inverse of this. Crafty would offer a gambit, the opponent would accept it (which was known to be bad and might not even be in the book file) and the first non-book search would show a negative evaluation (again, assume -1/2 pawn, Crafty being down a pawn, but up a big lead in development). It would incorrectly assume that this was a lost position and not play it again.

It became apparent that the first search result was not enough. I chose to spread this over several searches to try to "get a feel" for what is going on in the position. The current algorithm remembers each of the results from the first ten searches, and then looks at these values to determine what is happening. There are several possible things that can happen, as described below.

If Crafty accepts a gambit that it should not, the first search result is often going to show a positive score, because it will be ahead in material. But over the next 10 moves it will begin to "see the truth" and the search values will drop either quickly or slowly. The speed at which the evaluation drops is not nearly so important as the fact that the evaluation is continually declining move by move. Result number one then, is a steadily declining evaluation for the first ten non-book moves. If this happens, the last search result (the lowest evaluation) is saved as the first approximation of the learned result.

If Crafty offers a gambit that is perfectly playable, the first search result is often going to show a negative score, because recovering the material might take several moves. In this case, the ten search results will show the opposite slope from case one, because the scores are steadily increasing. In this example, the last search result is again saved as the first approximation of the learned result.

The other case is one where there is an inflection point in the curve that fits the ten search results. For example, after accepting a gambit, the score might actually continue to rise for several moves before the effect of the gambit is realized and the score starts to drop. The opposite case also happens when offering a gambit, where the score continues to drop for several moves before the benefits of the gambit become apparent and the score starts to rise again. In these cases, the learning algorithm searches for the best (or worst) value that occurs after the slope of the curve stabilizes.

At this point, the first approximation to the learned result is simply the search result chosen from the first ten non-book moves, as described above. However, there are complicating issues that make this value very difficult to use in its current state.

2.4 Trusting the learned value

Remember that this algorithm is designed to learn how to use the opening library, while playing on a chess server. The servers allow many types of games to be played, from game in one minute, to game in over two hours. The problem with using the learned result as chosen above is that it might lead to difficulty if Crafty learns that an opening is quite bad while playing very fast games, so that it will refuse to play it again, even at longer time controls where the search result will be more accurate due to the increased depth possible with more time. Since these learned results are essentially permanent, it became obvious that things learned while playing a very long game were more reliable than things learned while playing a game where a move might be made in less than one second.

To handle this case, the first approximation to the learned value is modified based on the depth of search for the search that produced that learned value. As a result of this, Crafty now stores the first ten search values, and the depth for each of those first 10 searches, so that the depth that goes with the learned value can be found. I chose to call a ten ply search "normal" and then developed a multiplier that scales the learned value based on the depth of the search which produced this value. (this is given in the LearnFunction() code in figure 2.1 below.)

The next issue is that of the skill of the opponent. If the opponent is much weaker than Crafty, then Crafty should win easily, and using such learned results would be very misleading against stronger players. As a result, Crafty notices the rating of the opponent and compares this to its own rating (both are obtained from the chess server or from the human opponent if this is not a server-based game). It then uses this rating difference to modify the learned result.

This is complicated, however, because the learned result and rating difference mean different things depending on whether the opponent is better or worse than Crafty. The first case is a good learned result (score is positive.) If the opponent is lower rated than Crafty, then this score should be distrusted, since Crafty should win even those games where the book line was a bad one. But if the opponent is rated higher than Crafty, then this score should be trusted even more than normal, because to produce a good score against a better opponent means this book position gave the program exceptionally good chances.

Negative scores (bad learned results) are different. If the learned result is bad, and the opponent is rated lower than Crafty, then the bad result should be trusted more than normal, because we would expect that the program could take a bad position and still win, unless the position is really bad. On the other hand, negative scores against a stronger opponent are expected, and we don't want to avoid playing openings that a better player was able to win, because that is the expected result for most any opening position.

2.5 Handling blunders by the opponent

One other issue has to be addressed, and we can produce the final learned result and then apply it to the opening book line. In most chess games, but particularly in fast games, blunders (tactical mistakes) occur fairly frequently. In thinking about this, I became convinced that positive scores are not as trustworthy as negative scores, because a positive score can be produced by a human (opponent) mistake, while negative scores mean that Crafty had to play poorly to reach these bad positions. Since the program is not prone to blunders and mistakes, it became obvious that negative scores were fairly reliable, while positive scores were not. So the final step in modifying the original learned value into the final form is to decide whether the score is trustworthy or not, based on its sign (positive or negative).

Taking all of these elements into account, the following function was written to take the first approximation of the learned value and turn it into the final value used to learn.

/*  sv=search value (first approximation to learned value)   */
/*  rd = Crafty's rating - opponent's rating                 */
/*  trusted = 1 if confidence in this value is high          */
/*  this means if Crafty is white and score is < 0, we trust */
/*  it more.  If Crafty is black, and score is > 0 we trust  */
/*  this more as well                                        */

int LearnFunction(int sv, int sd, int rd, int trusted) {
  float rating_multiplier_trusted[11] =
    {.00625, .0125, .025, .05, .075, .1, 0.15, 0.2, 0.25, 0.3, 0.35};
  float rating_multiplier_untrusted[11] =
    {.25, .2, .15, .1, .05, .025, .012, .006, .003, .001};
  float multiplier;

  sd=Min(sd,19);
  rd=Max(Min(rd/200,5),-5)+5;
  if (trusted) multiplier=rating_multiplier_trusted[rd]*sd;
  else multiplier=rating_multiplier_untrusted[rd]*sd;
  sv=Max(Min(sv,600),-600);
  return(sv*multiplier);
}

Figure 2.1:  LearnFunction()

This function works as follows. The search depth (SD) is constrained to the interval (0,19) to avoid numbers that would produces adjustments that might be too large. The rating difference (RD) is rounded to "classes" of 200 points, much like the USCF scale, and then constrained to the interval (-5,5) and scaled to 0-10, which corresponds to a rating difference of +/- 1000 points. Next, before doing the calculation, we choose between two different sets of multipliers, based on whether this is a trusted score (negative) or not (positive). (Note that we can't rely on score < 0 being trusted, because the score can be positive and be trusted, if Crafty is playing black.) Finally, the learned value (SV) is limited to +/- 600 (6 pawns) to limit the range of the final product. After all of these values are computed, they are used to produce the final learned result that reflects all of the discussion points above.

Now we have a value that estimates the position that results from following the book line being studied. This value has been adjusted based on the search depth used to produce it, plus the rating difference between Crafty and the opponent. The next step is to apply this in some way so that the program will play this opening again if the value is good, and so that it will not play it if the value is bad.

3. Applying the Learned Value

Once Crafty has produced the learned value, the next issue becomes how to apply this value to affect the way it chooses book lines. The current approach is actually quite simple to follow.

The idea is to look at an opening from back to front, while still treating it as a tree. The tree has branches at any point on the way back from the last move to the first move where there was more than one choice that is playable (playable means a move that could actually be played in a game, not a move flagged as "?" or one which has learned to be bad).

We start with LV=LearnFunction(...) to get the initial value for LV. This value is stored with each move starting at the last move, and working back toward move 1. At each known book move, we store LV with the move, and then divide LV by the number of alternatives at this point. For book positions with only one playable move, LV remains constant, but whenever there are alternatives, moves prior to that point get a smaller LV value applied.

The effect of this algorithm tends to propagate really bad scores far back up the book line because these scores remain large even after the division, while small scores do not get to affect choices beyond the last one or two in the book line since the division reduces them to very small values. This effectively treats the book line as a tree, but in reverse order. The last point in the book where there was a choice of moves looks like the first branch point. The next branch point is closer to the root of the book line. This information is then used when searching forward through the book, so that Crafty can avoid choices which were learned to be bad, while favoring choices that were learned to be good. And the most important feature of this is that all of this is done automatically inside Crafty with no external "help" required.

4. Portability of Learning

One important goal of this experiment was to take advantage of the many copies of Crafty running on servers all over the world. I wanted to be able to take advantage of the things other copies of the program would learn, as well as be able to share what the original Crafty learns with other programs. To accomplish this, I chose to write a PGN file at the same time the book learning update operation is done. The complete game score is written, along with the learned value, search depth and rating difference between Crafty and the opponent for that game.

This file contains only simple ASCII text, which eliminates any sort of compatibility problems between different architectures, and it can be imported with a simple "import" command (built in to Crafty) which will read in the game moves, then pretend it has played the game and use the included learned value to update the book, just as if it had actually played that game and learned the result.

This has one major benefit, based on the fact that most opening books are now being built from the millions of electronically readable PGN game scores available on the Internet. If you want to build your own book, you can locate PGN files and make your own book, based on specific players, specific events, or specific openings. If your book ends up with several hundred thousand games in it, you will discover that many of the moves played are bad and lead to quick losses by your program. If you import the learned results from Crafty, it has probably tried many of those same openings and you don't have to let yours stumble around to find out which openings are good and which are bad.

The overall result of this has been a form of "distributed learning" where copies of Crafty are playing games all over the world, and learning things about the opening as they play. These learned results can then be collected and imported into many different programs, so that these programs suddenly become as aware of those new openings as the program that actually first played them.

This has proven to be extremely useful in Internet play, since all of the programs can play using different opening libraries, and then share what they learn with other programs. In effect, this accelerates the learning speed dramatically, because thousands of programs are learning, and sharing this learned information quickly and easily.

It has been suggested that this could be turned into a "gang-learning" facility by adding a "learning server process" to a chess server. When a program learns something from the opening, it can transmit this to the "learning server" and when it is not playing a game, it can query the learning server to obtain new learned results from other programs. This would eliminate the current facility of using email to share the PGN book learning files and let one program learn something very quickly when another program loses by playing some new opening.

5. Experimental results

As an example, a common line from the Ruy Lopez is given below, with an obvious blunder thrown in. The main line here is 1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 4. Ba4. This example includes an odd choice at move 4, which loses the bishop by not retreating it as is required. (4. Nc3). This blunder game was included in the input file three times with different choices at move 6, to illustrate how learning works.

The raw input used to construct this opening book is given below.

[this is a bad line]
e4 e5 Nf3 Nc6 Bb5 a6 Nc3 axb5 Nxb5 Nf6 Nc3 Bb4 d3 O-O O-O
[this is a bad line]
e4 e5 Nf3 Nc6 Bb5 a6 Nc3 axb5 Nxb5 Nf6 d3
[this is a bad line]
e4 e5 Nf3 Nc6 Bb5 a6 Nc3 axb5 Nxb5 Nf6 O-O
[this is the normal line]
e4 e5 Nf3 Nc6 Bb5 a6 Ba4 Nf6

5.1 Normal move selection with learning

Crafty then played a game using this book with the book selection options set to force Crafty to choose the first move that is playable. After playing this game for ten moves, Crafty discovered it was a piece down, with a learned score of -2.53 (before being modified by the search depth, rating and so forth.) After analyzing the search depth (which was very shallow for this test), this learned value was scaled down to -1.35 by the LearnFunction() procedure given in figure 2.1.

At this point, the game was played again, and the book statistics are given below for the two moves where Crafty has alternative moves in the opening database, at move 4 and 6. At move 4, this is what Crafty uses to choose a move in game two if this opening is played: (the format of this information is (move, learned value): {(Nc3,-0.45), (Ba4,0.00)}. At this point, Nc3 looks bad, but not bad enough to not play, because the book selection option forces Crafty to sort the moves into order based on how frequently they were played and then pick the most frequent move unless the learned value is less than -.80.

Recall that the learned score is based on what happens below this point in the tree, and also recall that the original book input gives three alternatives two moves later. As a result, 1/3 of the original -1.35 was backed up to this book move. Two moves further along in the game, Crafty finds this book status: {(d3,0.00), (Nc3,0.00), (O-O?,-1.35)}. From this set of three possible book moves, Crafty has already determined that O-O will not be played because it has a learned result of -1.35, and has flagged that move with a "?" indicating that it appears to be a blunder.

In the first game, after playing 4. Nc3 and then 6. O-O, Crafty found the previous bad score of -1.35. In the second game, it gave up on trying 6. O-O, because the learned score is below a lower bound that makes this move not playable. Of course, 6. O-O is not the problem, but Crafty has not figured that out just yet. In this game, it therefore tries 6. d3 hoping to somehow save the piece. It then discovers that the raw learned result is -2.61 because it is still a piece down. After LearnFunction() modifies this score, Crafty remembers -1.35 for this move.

In game three, at move 4, the book statistics look like this: {(Nc3,-0.57),(Ba4,0.00)}. The move Nc3 has dropped further, since two alternatives below it appears to be blunders. This leaves only one playable move further along. This means that 1/2 of the learned value is backed up to this move, but it is averaged in with the previous learned result to avoid learning that a move is bad if it was playable in another game. Crafty finds this book status at move 6: {(O-O,0.00), (d3?,-1.35), (Nc3?,-1.39)}. Notice that after game two, it decided that d3 was also not playable, leaving only the move O-O as playable here. After trying this, it once again is faced with a learned value of -2.61, and it updates the book once again. However, this time things change further back in the book line, because at move 6 there is only one playable move left before this learning cycle is executed. This marks the last move as not playable, and the entire learned value is backed up to move 4, which is the next place (working backward) where there is a choice of moves.

In game four, at move 4 once again, the book statistics now look like this: {(Ba4,0.00), (Nc3?,-0.98)}. At this point, Crafty has learned that Nc3 loses material. It might not be the fault of this move, but at least it has learned that the alternatives at move 6 did nothing to prevent losing the bishop. As a result, it will no longer try Nc3? since it has learned, totally by itself, that this leads to a lost position.

That is an example of how this learning facility can automatically filter out bad moves without needing any sort of human intervention at all, which was one of the original goals leading to the development of this approach. Note that in this first example, the normal book move selection algorithm (based on simply playing the first move found) was used to choose moves from the book, but that the learned values eliminated moves that led to poor positions. This can be improved on as the next short example shows.

5.2 Improved move selection with learning

The above strategy takes too long to learn how to avoid losing due to a bad book line, because the bad move occurs in the tree above points where there are other choices. Each of the choices deeper in the tree have to be eliminated, one by one, before eventually eliminating the move that ultimately caused the problem.

For the second example, Crafty was set to sort moves based on the learned value and then was set to play the move with the largest learned value rather than the move most frequently played (unless the learned value was below the -.8 threshold which would eliminate it as in the previous example. Also note that if all the learned values for a set of book moves are zero, it reverts to using frequency at such positions since it has not learned anything there yet.)

After the book was rebuilt to clear the learned values, the same game was played again. Since all of the learned values were zero at each point where there was a choice, Crafty reverted to the frequency of play, as before, to break the ties. The first game produced the same learned value as before.

In the second game, at move 4, Crafty found the following two alternatives: {(Ba4,0.00), (Nc3,-0.45)}. In this example, 4. Ba4 is ordered first, because it has the higher learned value, so that in the second game, Crafty avoids the blunder line completely.

There are potential problems with either approach that are discussed in the next section, but given enough time, Crafty can learn quite successfully which openings are good and which are bad, at least as the openings apply to its unique skill level. It should be noted that the learned result can really reflect more about the opponent's skill level than to Crafty's in certain cases. It is certainly possible that a particular opening is not very good, but the opponent handles it very poorly leading to what appears to be a good result for Crafty.

6.0 Potential problems.

6.1 Difficult to play every book line

One non-obvious problem is that any opening book that is made from a raw input file that contains over one million games is going to have a generous variety of choices. And learning by playing these openings is going to take some time. The net result is that even after playing as many as one hundred thousand games in a year, that represents only 10% of the possible book lines that can be played. If steps are not taken to reduce the number of alternatives, learning may never influence all of the games.

6.2 Learning can take too long

With the example given in section 5.1, the problem is quite obvious. It simply takes a long time, and a large number of games, to discover that a particular move is bad or leads to positions that Crafty doesn't handle very well. On the other hand, if a move is frequently played by GM players, it is unlikely that it is an outright losing move, although it certainly can lead to positions where a computer chess program might get into considerable difficulty. The positive side of this is that even if a move is popular, it will be eliminated eventually, if it leads to bad positions; and it is eliminated automatically without any outside intervention required, which was one of the original goals of this project.

6.3 Becoming too predictable

With example 5.2, the problem is not quite so obvious. When Crafty plays a game as white, it learns for both sides of course. That is, assuming it wins as white, then it also learns to not play that opening variation as black since black lost. But when it plays white again, it will tend to follow that same opening line again since it will have a positive learned result at each move for white. This leads to trouble, because if it repeats the same opening enough times, someone will learn how to beat it.

It turns out, then, that example two learns quickly to avoid losing lines, but it also then learns to repeat winning lines, until it loses. Then it will learn to avoid that line and move on to something fresh. If this is repeated often enough, and this actually happened during the early development of this algorithm, Crafty eventually learns that all openings lead to losses and it starts playing as though it has no opening library at all.

To avoid this, the normal book selection criteria includes both the frequency of play for each alternative move, plus the learned result for each alternative. This still tends to avoid losing lines, but does avoid repeating the same winning line over and over until it becomes a losing line.

Before this modification was added, Crafty generated quite a bit of amusement one evening after learning was added. Playing on the Internet Chess Club server, it played a GM that tried his favorite opening line against Crafty, and he won. He then played a second game, which reversed the colors, and, as expected, Crafty played that same opening for black since it knew it was good. The GM had a chuckle, and thought it quite amazing that it learned so quickly. For the next game, the GM tried a different opening which turned out to be OK for Crafty, but in game 4, he was greeted by his favorite opening line again. He asked "when will it stop playing this?" I was forced to respond (with a chuckle) "when you manage to beat it."

6.4 Learning is permanent

One problem with the overall approach is that this sort of learning is appropriate for the version of Crafty that is used to produce the learning, but it may (or may not) be appropriate for future (and supposedly better) versions of Crafty. For example, early versions of Crafty handled bishops poorly, and learned that playing a g3/g6 type of opening led to positions that caused problems. Later, this was problem was addressed by evaluation changes that improved the way it handled these positions. Unfortunately, it had already learned that such openings were to be avoided when possible, and the new improvements were never given a chance to influence games.

It is certainly possible that as an opening move is played, or skipped, the learned value can be pulled slightly closer to zero, so that after a move has been learned to be bad, each time it is skipped it looks a little less bad (we might call this learning aging). This would allow program changes that modify the way the program plays to have a chance to influence games, because every so often, moves that were once thought to be bad slowly lose their "badness" and eventually can be tried again to see if the program has improved enough to understand the positions that gave it trouble previously.

This "aging learning" is not presently implemented in Crafty. The main issue is that moves that are "sort of bad" can have their learned value decayed over time, but moves that are "really bad" should be avoided forever. The current opening book is so large that this has not yet become a problem. However, I have found it useful to occasionally (every year or so) to clear the learning data from the book and let Crafty start fresh. After a year, it is likely that it plays differently enough to justify trying old openings again, although the ones that simply lose material are pointless and will produce losses again.

6.5 Learning may be "opponent specific"

It is possible that learning might apply more to the abilities of the opponent than to the characteristics of the opening line/position. For example, Crafty might play a wild gambit line and overwhelm an opponent that does not like wild tactical lines, and then conclude (learn) that this opening looks great. Games against a stronger opponent (particularly against a computer) will show this to be foolish and the second learning cycle will turn that variation off in the book.

There is an up side to this however, as it has a significant benefit in match play against the same opponent. If Crafty wins, it will try to repeat the same opening again, putting the onus on the opponent to vary and avoid a repeat loss. If Crafty loses, it will not repeat that opening, so that if the opponent has book learning capabilities, it will be unable to repeat the winning opening line against Crafty. Matches between a learner and a non-learner can be both ugly and repetitive, for obvious reasons. There has been much discussion about "is this fair?" I think the answer is that if a program author eschews book learning because it is complex to design and write, then he/she leaves the door open for terrible match results. The answer is to do whatever you can to assist your engine in avoiding repetitive losses, and this is clearly one such solution, that is probably much better than just trying to randomize play to prevent repetitions of old games.

7.0 Conclusions

This has turned into an highly effective methodology for creating a useful and modern opening library for Crafty. It requires no intervention from a human operator and no hand-tuning or hand-crafting of an opening book that fits the "style" of the program, because the program is given the task of modifying the book after each game to favor opening variations that result in good positions and to reject opening variations once they are shown to lead to a bad position.

With older chess programs (Cray Blitz to name one that I worked on) this would have been a wonderful methodology to have available, because the opening book was the weakest link in the way the program played. We actually used to "hold our breath" until we dropped out of the book and started searching on our own, hoping that the first search result would be something that was not already losing. I can recall several games that were played at ACM Computer Chess tournaments where a program was lost before it ever searched the first node.

Another problem is someone "cooking" an opening for you, based on a game you played in a prior round or prior tournament. In one notorious case, Chess 4.x lost a famous game to Chaos at a World Computer Chess Championship tournament. The opening book from Chess 4.x was used for later versions, and at a later event with humans and computers, two humans played the same opening and won their games in the same way, and the new program (named NuChess) was helpless to prevent this without human intervention. With Crafty, this is impossible and we don't have to be on guard to prevent a repetition of a losing book line. If we lose in round N of a tournament, we don't have to worry about playing that same opening in later rounds and take time to modify the book by hand to avoid doing this.

This learning methodology was originally intended to learn what not to play to avoid poor positions, but it also is able to learn which openings actually lead to positions where the program is capable of using its skills to best advantage to produce winning results.

Crafty also uses the transposition table to implement a third form of learning, as described by David Slate [Slate87]. This is yet another book learning defense, and gives some protection to handle the case where an opponent tries some odd opening (1. h3 ... 2. a3 ...) that takes Crafty out of book very early. In such a case, it is still possible to find a way to beat Crafty, and now book learning won't help at all since there are no book moves to choose between.

In this case, for the first 10 non-book moves, "position learning" is used to remember positions where the evaluation drops significantly. These positions and evaluations are saved on disk, and loaded into the transposition table as permanent entries when the program is executed. This provides a form of learning, even when there are no book lines, so that it is nearly impossible to repeat the same opening and win the same game over and over.

8.0 Future efforts

There are obviously many things left to try. The most interesting is the concept of "gang-learning" where multiple programs play on a common chess server, and share their learned results in real-time, so that when one plays a game, they all learn from it instantly. The present email facility used to share learning results is too slow to be effective.

A common problem on the chess servers is that a player will find a line that easily wins against a Crafty player, but that Crafty player learns how to avoid the line that causes the problem. The human will then move on to another program and go through the same games again until that program learns to evade that opening just like the first one. This is often used to inflate a player's rating in an obvious way, and it would be nice to see the first program player notify all of its "brothers/sisters" as it learns how to avoid such a loss. Of course, it would also be disconcerting to the humans that use this form of abuse as well.

Another idea that has been suggested several times is to allow Crafty to add to its opening book. After playing a game where the position out of the book appears to be favorable, and the first moves out of book lead to a position that is also favorable, then those first few non-book moves could be added to the opening book so that the next game is played using the extended book line. This is not unlike humans learn various openings beyond what the opening books provide, in that once a few moves have been played and lead to a favorable game, those moves will likely be played again.

The only danger of this is that the moves might be added after playing a quick game (fast time control) while a longer time control might show that the newly added book moves are really not good enough to play. Fortunately, normal book learning would still apply and Crafty would, after playing the longer game, possibly mark as UN-playable the moves it had added previously.

References

H. Berliner, "Computer Chess at Carnegie-Mellon University," in Advances in Computer Chess 4, D.F. Beal (ed.), Pergamon Press, Oxford, pp. 166-180 (1985).

R. Greenblatt, D. Eastlake. and S. Crocker, "The Greenblatt Chess Program," Proceedings of the Fall Joint Computer Conference, pp 801-810 (1967).

R. Hyatt, H. Nelson and A. Gower, "Cray Blitz," in Advances in Computer Chess 4, D.F. Beal (ed.), Pergamon Press, Oxford, pp. 8-18 (1985).

T. Marsland and J. Schaeffer, Computers, Chess and Cognition, Springer-Verlag (1990).

D. Slate and L. Atkin, "Chess 4.5--The Northwestern University Chess Program," in Chess Skill in Man and Machine, P. Frey (ed.), Springer-Verlag, pp. 82-118 (1977).

D. Slate, "A chess program that uses its transposition table to learn from experience," ICCA Journal, Vol 10, No. 2, pp 59-71 (1987).