Monday, June 30, 2008

A Peculiar Bug

In my postmortem of BrainWorks, I mentioned one of the big things I did right was creating a powerful debugging infrastructure directly in the code base itself. In general, it's easiest to test and maintain software when as much of the testing is fully automated as possible. Now that's easy for a program like Excel, where you can create a battery of sheets with strange formulas that might cause errors, and then confirm that the numbers all match up. If the software creates something measurable, it can usually be automatically tested.

Naturally, this means automated testing is far harder for Artificial Intelligence than other areas of computer science. The goal of BrainWorks is "bots that appear to be human". How do you teach a computer to measure whether a player's decisions appear to be human? That's tantamount to writing a program that can judge a Turing test, which is as hard as passing one. Fully automated testing of the AI isn't an option, but there are certainly things that can assist testing. All you have to do is identify an measurable component and check if it meets the human expectations.

For aiming, BrainWorks already measures accuracies with each weapon in each combat situation. The bot knows that it scores perhaps 35% with the shotgun against an enemy a few feet away and 5% against an enemy a hundred feet away. But it doesn't know if those numbers are reasonable until I tell it what to expect. The vast majority of the testing I did with aiming involved forcing a bot to use a particular weapon, making it play for hours, and then checking if the numbers "looked good". Generally they didn't, and I had to figure out why they weren't matching my expectations. In this testing process, I uncovered a variety of interesting bugs. Large sections of the aiming algorithm were rewritten around a dozen times trying to get things right. I found several errors in how accuracy was even being measured. But the strangest error I encountered was something totally unexpected.

I was monitoring Railgun accuracy as way of testing the overall ability to aim. It's an instant hit weapon with no spread and infinite range, so it's a great initial test case. I loaded up eight bots on a wide open level and forced them all to have exactly the same skill, then ran them for several hours. Curiously when I checked the results, their accuracies weren't all the same. The best had an accuracy around 75% and the worst was around 65%. Moreover, their scores reflected this.

I activated some mechanics in the system to modify aiming behavior. First I turned off all errors, so the aiming should be flawless, like watching a human who doesn't make mistakes. Their accuracies were still stratified. Then I completely circumvented the entire aiming code, forcing the bots to aim directly where they wanted to. That gave the bots what most people think of as an aim hacking, so their aim should have been perfect. But even still, testing showed that some bots would get higher accuracies than others. Sure, there was one bot that would score 99.9% accuracy, but another bot would only score 97%. When a bot has perfect information and reflexes, it should not miss one in 30 shots.

Then one day I noticed that all eight bots were sorted in alphabetical order. The bot with the first name had the highest score (and accuracy) down to the bot with the last name having the lowest score. Since the odds of this are 1 in 8! = 40,320, I considered this curious but still possibly a coincidence. So I tested it again, and each time the bots were sorted alphabetically! That was the final clue I needed to isolate this bug.

The script I used to start testing adds the bots in alphabetical order, so I tried swapping the order different bots were added and their accuracies changed as a result. Each time, the most accurate bot was added to the game first and the least accurate bot was added last. For some reason, the internal numbering of bots was affecting their aim.

So why exactly was this the case? I'll let you puzzle over it for the week. Next week I'll explain why this happened and the extreme amount of work that went into solving it.

5 comments:

Joe Osborn said...

I have a hunch that your bug was in an RNG not being seeded independently for each bot. RNG state is the kind of invisible side effect I've come to loathe.

Dave Mark said...

While I will have to think a bit more on your puzzle, one of the things you mentioned early on made me wonder if you had seen this column of mine.

Does This Mistake Make Your AI Look Smarter?

Anonymous said...

yay for puzzles :)

my hypothesis:
bots occupy client slots in the same order as they are added.
in the very rare cases of two clients killing a third one simultaneously, it is allways the one with the lower slot id that gets the frag, becuase his usercommands are processed first. the other one gets a missed shot.
though that does not explain why the accuracy difference got smaller after you decreased the aim "noise".

Anonymous said...

cyrri sure its gettin less with decreased noise cause that was making some miss even more shots in random but with a noise of 0% i think its exactly the problem u described. But it is not only possible when two bots kill a third, but even when they simultanously kill each other. the first bot that is being computed gets the hit...

Ted Vessenes said...

Dave: I hadn't read that post before but it's a good read. I don't think about my AI so much as mistakes and correctly functioning code as I do... Does this meet the requirement or not? That said, I've definitely had instances where something totally worked well that I thought would fail when I tried it. I guess you could classify that as a bug, but a bug I'll happily leave in my code!