Monday, June 30, 2008

A Peculiar Bug

In my postmortem of BrainWorks, I mentioned one of the big things I did right was creating a powerful debugging infrastructure directly in the code base itself. In general, it's easiest to test and maintain software when as much of the testing is fully automated as possible. Now that's easy for a program like Excel, where you can create a battery of sheets with strange formulas that might cause errors, and then confirm that the numbers all match up. If the software creates something measurable, it can usually be automatically tested.

Naturally, this means automated testing is far harder for Artificial Intelligence than other areas of computer science. The goal of BrainWorks is "bots that appear to be human". How do you teach a computer to measure whether a player's decisions appear to be human? That's tantamount to writing a program that can judge a Turing test, which is as hard as passing one. Fully automated testing of the AI isn't an option, but there are certainly things that can assist testing. All you have to do is identify an measurable component and check if it meets the human expectations.

For aiming, BrainWorks already measures accuracies with each weapon in each combat situation. The bot knows that it scores perhaps 35% with the shotgun against an enemy a few feet away and 5% against an enemy a hundred feet away. But it doesn't know if those numbers are reasonable until I tell it what to expect. The vast majority of the testing I did with aiming involved forcing a bot to use a particular weapon, making it play for hours, and then checking if the numbers "looked good". Generally they didn't, and I had to figure out why they weren't matching my expectations. In this testing process, I uncovered a variety of interesting bugs. Large sections of the aiming algorithm were rewritten around a dozen times trying to get things right. I found several errors in how accuracy was even being measured. But the strangest error I encountered was something totally unexpected.

I was monitoring Railgun accuracy as way of testing the overall ability to aim. It's an instant hit weapon with no spread and infinite range, so it's a great initial test case. I loaded up eight bots on a wide open level and forced them all to have exactly the same skill, then ran them for several hours. Curiously when I checked the results, their accuracies weren't all the same. The best had an accuracy around 75% and the worst was around 65%. Moreover, their scores reflected this.

I activated some mechanics in the system to modify aiming behavior. First I turned off all errors, so the aiming should be flawless, like watching a human who doesn't make mistakes. Their accuracies were still stratified. Then I completely circumvented the entire aiming code, forcing the bots to aim directly where they wanted to. That gave the bots what most people think of as an aim hacking, so their aim should have been perfect. But even still, testing showed that some bots would get higher accuracies than others. Sure, there was one bot that would score 99.9% accuracy, but another bot would only score 97%. When a bot has perfect information and reflexes, it should not miss one in 30 shots.

Then one day I noticed that all eight bots were sorted in alphabetical order. The bot with the first name had the highest score (and accuracy) down to the bot with the last name having the lowest score. Since the odds of this are 1 in 8! = 40,320, I considered this curious but still possibly a coincidence. So I tested it again, and each time the bots were sorted alphabetically! That was the final clue I needed to isolate this bug.

The script I used to start testing adds the bots in alphabetical order, so I tried swapping the order different bots were added and their accuracies changed as a result. Each time, the most accurate bot was added to the game first and the least accurate bot was added last. For some reason, the internal numbering of bots was affecting their aim.

So why exactly was this the case? I'll let you puzzle over it for the week. Next week I'll explain why this happened and the extreme amount of work that went into solving it.

Monday, June 23, 2008

Abstract Art

As part of a special program in high school, one of my classmates, Chris, applied to work under an orchestral conductor during his senior year. Chris was a brilliant musician, but the interview process with this particular conductor was a grueling two hour endeavor, much more than the typical person in the program would have gone through. The conductor wanted to find out if Chris could do more than play music well. He wanted to know if Chris understood what music was. Chris related that the experience was more of an intellectual sparring match than an interview, and everything revolved around answering one question:

What is art?

Now there are all kinds of meaningless philosophical answers to this question. "Art is whatever you say it is" or "Art is beauty". If you dig into these answers, you just end up with the tautology "Art is art". That's true, mind you, but not very helpful. However, the music conductor had a very good answer to the question, and he accepted Chris' application when Chris understood the answer.

Art is expression.

When I heard the answer, it really resonated with me. That's a real answer, not a tautology! The purpose of art is for the artist to convey a point to those people experiencing the art. That means art can be measured by two factors:
  • How important the point is to the audience
  • How well the audience understands the point
Note that the audience is a crucial part of art. Immediately when I heard this answer, I understood why I'd always hated abstract art. It's bad art because it neglects the audience. Generally abstract art boils down to one of two types.

Type 1: "This black square set against a mess of orange and red circles represents the current triumph of humanity against nature, but the ultimate futility of the struggle." While that might be a good point, it's not particularly novel, and the point isn't conveyed very well. Other artists have done a much better job of creating works that express this point without even intending to. Fundamentally I feel like this art exists so that art snobs can feel elitist, because they "understand" the point and other people don't. But obfuscating your point doesn't make you clever; it makes you a bad artist.

Type 2: "This black square set against a mess of orange and red circles doesn't represent anything because art doesn't need to have a point. It can mean anything and everything!" In that case, the artist isn't expressing anything at all. They are doing an exceptional job of doing nothing. It's not art because it's not expressing a real point.

This doesn't mean art needs to be hyper-realistic. Reality isn't as fuzzy as the impressionistic paintings Monet did of water lilies, yet millions of people from different cultures and backgrounds have all recognized the beauty of that artwork. The true test of art is not whether its point is realistic, but whether the point has been successfully conveyed to the audience.

With that as the framework, I'd like to talk about something different from Artificial Intelligence: the writing of this column. You might notice that I express a lot of stuff in layman's terms, glossing over details, sometimes going back to them and sometimes not. It's my goal that this column will never require more than a first year undergraduate's experience to understand. That's explicitly because it's crucial that people can understand what I'm writing. If I can't explain things simply, then I'm not doing my job as a writer.

Each week I pick a topic that might be interesting. This column is partially just a personal soap box-- I talk about things I find interesting and usually I relate it to AI. But I cannot escape the purpose of art, that art is only meaningful to the extent that your audience cares. So I want to get a better understand about what topics and styles you, the reader, appreciate the most. It's my job to write about things you want to read.

I analyzed the articles I've written and they fall into one of four categories:
  • Technical discussions of exactly how an algorithm works. Discusses actual code.
  • Overview discussions of how to approach an algorithm. Discusses situations that would require programming, but not actual code.
  • Reflections on artificial intelligence and computer science in the abstract. Discussion relates to philosophy.
  • Philosophical discussions. Usually relates to intelligence, may or may not relate to artificial intelligence.
I'd love to know what kinds of articles people enjoy the most so I can plan a good mix of them. Please, post your preferences in the comments! And if you're interested in hearing a particular topic covered, also mention that and I'll try to cover it at a later date.

Monday, June 16, 2008

Give It A Go

While my primary area of expertise is First Person Shooter AI, I still get a number of questions about other AI applications. One of the hardest game AI problems to date, if not the hardest, is the game of Go. What's so surprising is that Go has simpler rules than Chess, but Go AI is several orders of magnitude more complicated than good chess AI. Now consider that the best chess AI to date has only beaten a grand master, Kasparov, when it was specifically trained to counteract Kasparov's play style. In the first set of games, which Kasparov won, he commented that he did better when he started making what he considered the second best move instead of the best move. Apparently the first AI was so hyper-tuned against Kasparov's play style that it defeat other similarly good plays that it did not think Kasparov would make.

Yes, that's the best the world has done for Chess AI: A program that can defeat one really good player. But Go? The best AI programs just rank as decent amateurs. And listen to how simple the rules are:
  • Players take turn placing stones of their color (black or white) on a square board (usually 19x19 but sometimes 9x9).
  • Stones of the same color that are adjacent vertically or horizontally are part of the same group and live or die together.
  • When a group is completely surrounded (no plays exist that add a stone to the group), that group is captured. Each lost stone is -1 point at the end of the game.
  • If a play is made that would cause both the placed stone and an opposing group to be surrounded, the place stone lives and the opposing group is captured.
  • You may not make a play that results in a previous board position.
  • The game is over when both players consecutively pass, opting not to play a stone. Each player scores 1 point for each empty space that is enclosed only by their color of stones.
What makes this problem so hard to solve? Well for starters, the 19x19 board is enormous. There are a lot of possible positions. Moreover, even though the plays are just of single stones, they come together to form different shapes like pixels form a picture. Different shapes have different properties, strengths, and weaknesses. A typical alpha-beta search becomes rapidly unusable when the average branching factor is on the order of 300.

Nevertheless, in the past few years there have been a few significant advancements in the field of Go AI. Rather than trying to solve Go in the same way Chess would be solved, some Monte-Carlo algorithms have shown a lot of promise. In particular, in 2006, "Upper Confidence Bounds applied to Trees" (or UCT) was developed. I'm not an expert on this subject, but in brief, the concept works like this.

While it's very hard to determine which space on a 19x19 board is the best, it's very easy to compute the score of a Go board. Contrast this with Chess where the actual pieces in play are secondary to question of whether or not a King has been captured. So the very rough idea is to try most options (pruning the obviously bad plays) and play a semi-random game for each of them (again, pruning obviously bad plays). Far less time is spent at each stage of the tree. If there are 50 possible moves that aren't terrible, one of them is just selected at random. There's no aggressive search to determine which of those 50 is the best. When the test game is done for a given play, that provides an estimate of how good that particular initial placement is. Other placements are tried and the one with the highest estimated score is selected as the computer's official move.

Of course, the actual algorithm is quite a bit more complicated than that, but that's the general idea behind the innovation. The crucial lesson here for AI designers is this:

Focus on your goal.

In the end, Go is just a score maximization problem. It's very easy to get bogged down in the details of whether a move nets 5 points or 7 points, quickly producing a state where far too many calculations need to be done. Often the best solution involves determining how much processing time you can actually spend, and then working backwards to figure out what methods will give you the most information for the amount of time spent. In Go, calculating the board's score is relatively fast, so latest algorithm that exploited this fact had a major advantage over those that did not.

Monday, June 9, 2008

Making Things Right

You do it thousands of times each day without even thinking about it. It happens when you put on your clothes, brush your teeth, eat your food, and open a door. Every time you walk down a hallway, drive down the street, or type at your computer, your brain is rapidly, subconsciously processing all kinds of data. Most of the information is uninteresting and gets discarded. But when the brain encounters little bits of unexpected data, for the most part it seamlessly reacts, performing minor adjustments to correct whatever mistakes were made in fine motor control. Someone might brush your shoulder as they pass you on the street, and without even thinking about it you step to the side. Or you see the car in front of you start breaking and you slow down as well. You don't swallow your food until the pieces feel sufficiently chewed in your mouth.

Somehow, the human brain performs thousands of minor error corrections while performing basic motor controls. Large motions are fast but they aren't precise. How exactly does this work? If you mean to move your hand 10 centimeters and actually move it 11, your brain compensates for the extra centimeter and has your hand move back. Well maybe it only moves 9 millimeters instead of 1 centimeter, but it gets close enough that it doesn't matter.

When I discussed how BrainWorks emulates human wrist motion, I explained this elegant mathematical model where errors accumulate based on muscle acceleration. But I brushed over the other half of this system, where the bot accounts for its error and readjusts. With error correction that's not good enough, a bot's aiming just looks erratic. And with error correction that's too accurate, the aiming looks robotic. There's a very fine balance to getting error correction that looks human, and to do that you need to understand how the human brain corrects muscle errors. There's just one problem with that...

I have no idea how it happens.

There have been numerous studies on this kind of thing, but I confess I haven't read them. Maybe they would have been useful; maybe they wouldn't. Instead, I tried a number of different methods for error correction (that number is 5), and only one of them produced realistic results. While I can't tell you how the human brain solves this problem, I can explain the one that worked.

From the bot's perspective, here's the problem. It thinks it's aiming at position X, but in reality it's aiming at X+e, where e is the error factor. As the bot corrects its error (ie. understands the mistake it has made), the error factor e approaches zero. An error factor of zero means the bot thinks it's aiming at X and is actually aiming at X+0 = X, meaning it perfectly understands where it is aiming. Most of my failed attempts used some form of exponential or linear decay. Ironically, the algorithm that worked is by far the simplest. It's just a canonical monte carlo algorithm:
  1. Set the error value to a random number between -e and +e
  2. There is no step 2
It's just that simple. You can read the full explanation behind this algorithm in the DataPerceiveCorrect() function in ai_view.c. It's a 100 line comment describing, quite literally, one line of actual code. Here's the body of the function:
float DataPerceiveCorrect(float estimate_offset)
{
// Pick a new value in (0, +error) or (-error, 0)
//
// NOTE: It doesn't matter what the sign of error is; the random
// function will preserve it.
return random() * estimate_offset;
}
While I won't reproduce that entire 100 line description here, here's a portion of it that explains in more detail why this works:

Humans seem to use the following algorithm for refining estimates. For example, finding the dictionary page that contains a given word.

1) Make a reasonable estimate given the human's understanding of the situation. For example, even though W is the 23rd letter of the alphabet, humans don't look at the (23/26) * MAX_PAGES page of the dictionary when looking up W words, simply because they know the X-Y-Z sections are so short. This indexing is similar to the Interpolation Search algorithm. This result is compared to the actual value (ie. is the guess too high or too low?) and this value is fixed as either an upper or lower
bound. In other words, you mark this page with your finger.

2) Possibly make one or two more reasonable estimates to determine both an upper and lower bound that the data must reside in. At this point the human knows that lower < value < upper. He or she knows the precise values of "lower" and "upper" but NOT of "value".

3) Pick a random value in the interval (lower, upper) and compare it to "value". This selection replaces either the lower or upper bound as necessary. Repeat until the selection equals "value".

This might seem unintuitive, but humans don't actually use binary search to narrow down their errors when the range gets sufficiently small. Perhaps it takes too much time to roughly estimate the middle. In practice people will flip through maybe 10 pages at a time, or 1 page at a time, or just pick something and see. It will take more iterations to converge than binary search would but-- and this is crucial-- it takes less time overall than computing the midpoint at each iteration.
That is it say, a computer's most natural method of search is not the same as a human's. Computers usually operate best with binary search while humans "just try something". Only when I programmed the AI to do something counter-intuitive (for a computer) did the AI seem human.

Monday, June 2, 2008

It's All In The Wrist

I consider the aiming algorithm used by the BrainWorks bots to be the biggest success of the project. More than one experienced Quake player has told me how amazed they were to spectate the bots and watch their aiming, since it appears so realistic. I went through close to a dozen different iterations of the basic aiming engine until I got results I was pleased with, and this is the research that took most of the six year development period. I estimate roughly 60% of the work was spent on aiming! (If you're curious, item pickup was about 30% of the time and everything else took 10% total.) To me, the objective was to create a bot whose aiming realistically matched a human player's aiming. There's a secret to how I did this:

It's all in the wrist.

Most games don't let you spectate their AI units because it would be immediately obvious how the AI is cheating. But if you could, you almost always see on of two things. The AI might look almost directly at the player and but moves their camera away from the ideal position by a random amount, so that they miss sometimes at random. Or the AI could turn at a fixed, slowish speed, but always (eventually) aims more or less perfectly, so that they only miss right after the target suddenly changes.

The problem is that these algorithms don't model how a human wrist would interact with a mouse. So the result is that the AI won't miss at sometimes the human would miss, and will hit when the human won't hit. For example, rapidly moving side to side will confuse the AI whose aiming moves at a fixed, slowish rate, making them constantly miss. With these traditional algorithms, no matter what parameters the bot has for speed and error frequency, its aiming will never look like a human. That means its accuracy with weapons will never match human accuracies either.

The solution to modeling the aiming problem is to stop thinking about the monitor (whether the crosshairs are lined up) and start thinking about the mouse (how human motion changes the crosshairs). Human motion of the mouse doesn't move at a fixed speed, and it doesn't "randomly" move to the wrong area. The motion involves acceleration and deceleration. The faster a human moves a muscle, the larger the potential error. Humans can increase the precision of their movements by moving slower, so typical human motion involves a rapid acceleration to a desired speed, some time moving at that speed, and then deceleration to a rest state.

Mathematically this means that if a human attempts to accelerate at rate X m/s^2, the actual acceleration is somewhere in the interval [X*(1-e), X*(1+e)] where e is the maximum allowed error (eg. 10%). So a small error in acceleration is magnified over time each time the acceleration is applied to the velocity, and the velocity error in turn magnifies position error each time the velocity changes the view position. This, combined with an extremely simple, elegant error correction algorithm, provides an excellent simulation of the view motion created by a human hand on a mouse.

That's basically the algorithm BrainWorks implements. The bot splits its view state into two distinct axies: Up and down on the simulated mousepad make the bot pitch its view up or down, and moving left or right on the mousepad make the bot rotate left and right. This reduces the view problem into two identical one-dimensional problems. For each axis, the bot decides how much to accelerate and decelerate the movement of its simulated mouse to get from its current view position to its selected view position as quickly as possible. The small acceleration error factor is all that's needed to generate the smooth overshooting mistakes that humans make when they try to turn quickly with a mouse.

The calculus to compute exactly how much time to spend accelerating and decelerating is a bit complicated: there's a 150 line comment explaining the derivation in ViewAxisModify() function in ai_view.c. But the resulting equations are fast and easy to compute. It's 30 lines of actual code, only one square root, four floating point divides, and a whole bunch of adds and multiplies.

As I said, it's all in the wrist.