BrainWorks: debugging

Showing posts with label debugging. Show all posts

Monday, July 21, 2008

Gaming The System

Typically, a game artificial intelligence developer views their job as creating an opponent that will entertain the player. There's nothing wrong with this view. After all, if the player isn't having fun, it doesn't matter what the AI does well. The most realistic play in the world doesn't mean a thing if the player doesn't enjoy playing against it. It's possible to make AI that plays like that, of course. If a human can play so well it frustrates their opponent, an AI can do that too. But is the problem the AI design or the game design? I'd argue the game design is at fault. In other words,

AI design is a counterpoint to game design.

Game design really has two parts to it: making the game fun and making the game challenging. Alternatively, you can think of this as making the game fun for the winners and making the game fun for the losers. If a game is too easy for you, then chances are your opponents are getting crushed and aren't having a good time. A good multiplayer game is one in which everyone enjoys it regardless of whether they win. And a good single player game is one where it's neither too hard nor to easy to win.

That means AI can be a valuable tool in the design feedback loop for a game. If the AI continually exploits a certain unsatisfying strategy, you can expect players will do the same, and the game should be changed. When the AI uses a variety of interesting strategies, that's a good sign your game design is solid.

In the design of BrainWorks, I came across a few instances of AI behavior that commented on the game design of Quake 3. For example, on any level with a reasonably accessible BFG, the bots will do everything in their power to grab and use the weapon. They might even camp BFG ammo just to make it harder for other people to use the gun. While it makes gameplay boring and predictable, they aren't necessarily doing the wrong thing. The AI determined that the BFG was 10 times better than the next best option and made it choices accordingly.

There's also the issue of the machinegun. You might notice that if there are a lot of players on a level, the bots will sometimes flat out skip weapon upgrades and just use the machinegun. This also isn't a mistake. When there are more players on the level, each player's life expectancy decreases. When the bot won't live that long anyway, there's no point in wasting time for a new weapon. The starting machinegun is only slightly worse damage and comes with a reasonable amount of ammunition.

In the case of the BFG, the weapon is obviously too powerful, but it's pretty clear that was the point. The machinegun is a different case though. Is it really too powerful or do players just start with too much ammo? Or maybe it's fine that this is a strategy when a level becomes overpopulated, and human players that waste too much time picking up weapons are just playing it the wrong way.

This is just opinion, but I believe the problem is the amount of damage the machinegun deals. The reason I think this is that there are actually has two damage values for machinegun bullets. Normally it's 7 damage per hit, but in teamplay mode it's only 5 damage. Apparently iD software determined that in a teamplayer game, the standard 7 damage was too much. And they're right, but the problem wasn't the weapon's use in team games. It's the weapon's initial power in games with lots of people, even large free-for-all games. What would be best is if the weapon just dealt 5 damage always. Then freshly respawned players had incentive to grab a new weapon as soon as possible. In game design, the simpler solution is usually the best choice.

On a related note, game design is also a hobby of mine. My wife and I have two upcoming board games that will hopefully be released end of this year or beginning of next. If people are interested in more columns explaining the thought process that goes into game design, I'm happy to write more on that subject. In my mind, designing a game is very similar to designing artificial intelligence.

Monday, July 7, 2008

A Simple Solution

As the story ended last week, I had uncovered a very strange bug in BrainWorks. There was up to a 10% accuracy difference between the first bot added to the game and the last bot. No one guessed the correct cause, but this one from cyrri was the closest:

bots occupy client slots in the same order as they are added.
in the very rare cases of two clients killing a third one simultaneously, it is allways the one with the lower slot id that gets the frag, becuase his usercommands are processed first. the other one gets a missed shot.

This actually does happen, but it only accounts for a 1% to 2% accuracy change between the first and last bots. Also, this value increases as bot accuracy increases, since it's more likely that two bots will be lined up for a good shot at the same time.

The real culprit was the mechanism through which the server processes client commands. The server processes each human player's inputs as soon as it receives them (as fast as 3 milliseconds for a human), but the inputs for bots are processed exactly once every 50 milliseconds. In turn, each bot handles its attacks and then moves. Then the next bot is processed, and they do this in the order they were added to the game. See the problem?

Every bot made its decisions based on where the other bots were currently located at the end of the last server frame, but only the first bot will actually attack against targets in those positions. By the time the last bot aims, every target had already spent 50 milliseconds moving. The first bot had 0 ms of latency and the last bot had 50 ms. Now 50 milliseconds isn't too bad, except the last bot was playing as if its latency was 0. That's why it missed around 10% more.

Since one of my original project constraints was that nothing would change in the server code, that meant that at least some bots had to play with 50 milliseconds of latency. There were no changes I could make to reduce this problem. So the solution was to add latency to all the bots. I wrote a feature into the AI core that tracked the positions of all players in the game for the last half second worth of updates or so, for all bots and all humans. Then if a bot needed to know where a target was, it looked up the properly lagged position and did basic extrapolation to guess where the target would be at the exact moment of attack.

Implementing this system gave all the bots very similar accuracies (within 1% due to the issue cyrri pointed out). But now the problem was that all bots the same accuracy as the worst bot, when they should have had the accuracy of the best bot. It turned out this "basic extrapolation" wasn't good enough. The original Quake 3 AI code used linear trajectories to estimate where a target would end up, nothing more sophisticated than that. So if a bot aimed at a target running to the bottom of a staircase, the bot would keep aiming up into the ground for a while, even though a human would know the target would move straight.

I tried some slightly more advanced solutions, doing basic collision checking against walls, but that didn't solve the problem of running down a ledge. I eventually concluded that humans have a learned sense of physics they take into account, and the bots would need that same sense if they were to play like humans. Solving this problem was both straightforward and time consuming.

I made an exact duplicate of the entire physics engine, modified it for prediction, and placed it in the AI code.

Every detail needed to be modeled-- friction, gravity, climbing up and down ledges, movement into and out of water. Even the force of gravity acting on a player on an inclined ledge. Everything had to be duplicated so that the bots could get that extra 10% accuracy in a human-like manner. It was not easy, and I learned far more than I wanted to know about modeling physics in a 3D environment. But in the end, it was worth it to see bots that could aim like humans.

Monday, June 30, 2008

A Peculiar Bug

In my postmortem of BrainWorks, I mentioned one of the big things I did right was creating a powerful debugging infrastructure directly in the code base itself. In general, it's easiest to test and maintain software when as much of the testing is fully automated as possible. Now that's easy for a program like Excel, where you can create a battery of sheets with strange formulas that might cause errors, and then confirm that the numbers all match up. If the software creates something measurable, it can usually be automatically tested.

Naturally, this means automated testing is far harder for Artificial Intelligence than other areas of computer science. The goal of BrainWorks is "bots that appear to be human". How do you teach a computer to measure whether a player's decisions appear to be human? That's tantamount to writing a program that can judge a Turing test, which is as hard as passing one. Fully automated testing of the AI isn't an option, but there are certainly things that can assist testing. All you have to do is identify an measurable component and check if it meets the human expectations.

For aiming, BrainWorks already measures accuracies with each weapon in each combat situation. The bot knows that it scores perhaps 35% with the shotgun against an enemy a few feet away and 5% against an enemy a hundred feet away. But it doesn't know if those numbers are reasonable until I tell it what to expect. The vast majority of the testing I did with aiming involved forcing a bot to use a particular weapon, making it play for hours, and then checking if the numbers "looked good". Generally they didn't, and I had to figure out why they weren't matching my expectations. In this testing process, I uncovered a variety of interesting bugs. Large sections of the aiming algorithm were rewritten around a dozen times trying to get things right. I found several errors in how accuracy was even being measured. But the strangest error I encountered was something totally unexpected.

I was monitoring Railgun accuracy as way of testing the overall ability to aim. It's an instant hit weapon with no spread and infinite range, so it's a great initial test case. I loaded up eight bots on a wide open level and forced them all to have exactly the same skill, then ran them for several hours. Curiously when I checked the results, their accuracies weren't all the same. The best had an accuracy around 75% and the worst was around 65%. Moreover, their scores reflected this.

I activated some mechanics in the system to modify aiming behavior. First I turned off all errors, so the aiming should be flawless, like watching a human who doesn't make mistakes. Their accuracies were still stratified. Then I completely circumvented the entire aiming code, forcing the bots to aim directly where they wanted to. That gave the bots what most people think of as an aim hacking, so their aim should have been perfect. But even still, testing showed that some bots would get higher accuracies than others. Sure, there was one bot that would score 99.9% accuracy, but another bot would only score 97%. When a bot has perfect information and reflexes, it should not miss one in 30 shots.

Then one day I noticed that all eight bots were sorted in alphabetical order. The bot with the first name had the highest score (and accuracy) down to the bot with the last name having the lowest score. Since the odds of this are 1 in 8! = 40,320, I considered this curious but still possibly a coincidence. So I tested it again, and each time the bots were sorted alphabetically! That was the final clue I needed to isolate this bug.

The script I used to start testing adds the bots in alphabetical order, so I tried swapping the order different bots were added and their accuracies changed as a result. Each time, the most accurate bot was added to the game first and the least accurate bot was added last. For some reason, the internal numbering of bots was affecting their aim.

So why exactly was this the case? I'll let you puzzle over it for the week. Next week I'll explain why this happened and the extreme amount of work that went into solving it.

Monday, April 7, 2008

Getting it "Just Right"

There's not much difference between driving 55 and 56. Unless of course the speed limit is 55, in which case that small amount of speed could make you get a speeding ticket. Standard outside temperatures average between 0 and 100 degrees Fahrenheit, but people easily notice the difference between 68 and 72 degrees. And yet for other measurements in our lives, small changes are completely unnoticed. A car with a full gas tank drives just as well as a car with a quarter tank left-- you only notice the problem when the last drop of gas is gone.

When you design artificial intelligence, everything needs to be broken down to numbers, since a computer can only understand numbers. And a lot of times you have no idea what numbers best simulate the behavior you want to elicit from the AI, or even if the number encodes the correct concept.

Think of the problem this way. Suppose I have a value that encodes the maximum error a bot can make while attempting to aim. If the AI doesn't behavior properly with the initial value I selected, I'll want to try other values. By trying different values, I'll rapidly find out whether this number is very sensitive to changes (like temperature) or insensitive (like a gas tank). But if the bot doesn't seem to be doing the right thing, there's still no way to know what the right value is.

A sensitive number is extremely hard to tweak, because you won't see the right behavior until you get things "just right". If the value doesn't encode the right concept, then that "just right" state won't exist, but you'll never know that. You'll just see how all kinds of different values don't work in different ways.

And an insensitive number generally just has an impact when it crosses an important boundary (for example, driving 56 instead of 55, or your gas tank being 0% full instead of 1% full). There's often no indication where this interesting numerical boundary might be.

Additionally, values can depend on each other. Perhaps changing this maximum error value makes some other value be sensitive, when before that value was insensitive (and thus not aggressively tweaked).

All this to say that when you write AI, there's an awful lot of number tweaking going on. Either you do it by simulations, by neural network training, genetic algorithms, or by hand. But one way or another, you'll have dozens of numbers that all need to be "just right" to present the illusion of intelligence. The good news is that once you've spend time finding these values, you'll have excellent insight into how humans approach the problem. That's how research works, I guess.

For example, do you know what value is the single biggest factor in determining a player's aiming accuracy? It's how fast the player moves the mouse, not the error in mouse movement. The most important factor is the maximum allowed acceleration of the bot's simulated mouse. I know this because of all the values that go into aiming (and there's close to a dozen of them), the acceleration factor by far has the largest impact on actual bot accuracy. There are some very good reasons why that is the case, which are a bit to complicated to explain right now. But simply change the the maximum acceleration for high skill bots from 1400 to 1600 and you'll see an immediate increase in actual bot performance. This value was one of the last values I decreased before release, actually, so BrainWorks bots didn't have the same ungodly aim that traditional Quake 3 bots have.

At any rate, getting artificial intelligence to feel "just right" really comes down to finding the appropriate things to measure and the values those measurements should have. There's no right way to do this, and certainly no magic solution. BrainWorks uses statistical modeling for some things (weapon accuracy) and fixed numbers derived from simulations for others (aiming). Seeing how much the final result can change from just minor alterations in one number certainly gives an appreciation for the complexity that goes into genuine intelligence. If changing just one of twelve values has a dramatic impact on how skilled a bot appears, try to imagine the complexity of millions of neurons in your brain. And yet they all operate together in a (mostly) cohesive unit. It boggles the mind.

Monday, March 24, 2008

My Biggest Programming Fear

I have encountered some very strange bugs in my programming career. And most of them have been in BrainWorks, being by far the most complicated piece of software I've ever written. One of the worst involved a build that only crashed when run on someone else's system. On my system it was fine, but it would crash on my friend's system. It wouldn't crash right away, mind you. Sometimes it would take up to two minutes, so even testing to see if the bug was fixed took time. Oh, and his computer was in London while mine was in Los Angeles. Not exactly the easiest problem to solve.

I solved it by compiling in debug flags that would turn on or off entire sections of code and then had him run it. "Okay, load bots but turn off all movement, scanning, and aiming. Does it crash now? What about when Item Pickup is turned off?" Through the course of four builds that gradually narrowed in on particular blocks of code, we eventually found the offending bug.

(If you're curious what caused it, an uninitialized value was improperly being treated as an address, crashing the program whenever it was accessed. Because his machine's memory layout was different from mine, it would occasionally access invalid memory and the operating system would kill the program. For whatever reason, the uninitialized values that showed up on my system always happened to refer to memory BrainWorks was allowed to access, so it never crashed for me.)

It took about eight hours to do, but once you've solved a bug like that, you feel cabable of tackling any bug. Bugs bother me, but they don't frighten me. Know what my biggest programming fear is?

0 total errors

Any time I've spent over two hours working on a particular feature and attempt to rebuild the code base, I expect to see at least one error. I'm a good programmer, but that doesn't mean I'm perfect. I'm a good programmer because I know I'm not perfect. Good programmers assume they will make mistakes-- that's why they add all those safety error checks. If some other piece of code does the wrong thing, their error checks will contain it.

Last week I spent four hours tracking down the issues with bots overvaluing the shotgun. The problem was that the estimation of fire frequency wasn't precise enough. I spent another three hours analyzing the similarities between estimating the chance of hitting with a weapon (hits divided by attacks) and the chance of firing a weapon (attacks divided by potential attacks) and designing functionality that merged the two concepts. Actually writing the code took another 2 hours. Nine hours total including changes that could totally break the AI's ability to even attack, and here I am looking at the results from my very first recompile:

0 total errors

I haven't run it yet, but I'm terrified. Zero errors on the first try? That never happens. That's not even supposed to happen. Odds are the code I wrote has an error somewhere; I just don't know where.

At any rate, I plan to test it this upcoming week and get a new release out this weekend. Hope you enjoy it.

Monday, March 17, 2008

Release Retrospective

While people have been overall pleased and impressed with the BrainWorks release, I've received one common bit of negative feedback. Many people have said the bots use the shotgun too often, especially in situations where it's flat out the wrong weapon to use. Now I don't have the luxury other people have of saying, "that's not a bug, that's a feature!" Or more commonly, "that's working as intended." BrainWorks is intended to feel realistic. If many people say a portion of it doesn't feel realistic, they are right. I have no grounds on which to disagree. There really is a bug in the code because you say there is.

Of course, if you've ever been told quite literally, "this thing doesn't feel right, so go fix it," you know how challenging of a request that is. One downside of fundamentally basing the BrainWorks AI on causality is that debugging is extremely difficult. There's no magic number that tells the bot how often to use the shotgun. Instead, the bot analyzes its situation and incorrectly decides the shotgun is the right weapon to use. To solve this problem, I need to walk through the entire analysis and see how it reaches that conclusion. To make matters worse, the bots don't always choose to use the shotgun. And there are surely situations where the bot correctly chooses the shotgun. It's very hard to isolate the situations where the mistake as made and then narrow down why that mistake occurred.

My solution was to write some debug code that outputted all the data the bot used at each step of weapon selection analysis, starting from accuracy and fire rate data and ending with its final valuation of how quickly a given weapon would score a kill. I had it output this data for all weapons available and just stared at the numbers, checking if each data point was an accurate value of the concept representing it. I found three key contributing issues, two of which are solvable.

#1) The estimation of weapon fire rate was incorrectly bounded

Knowing how frequently the bot will fire a weapon is crucial to determining the actual damage rate of the weapon. If a bot spends 3 seconds aiming at a target and only fires for 1 of those seconds, it will do one third the damage as a bot that spends all 3 seconds aiming-- provided it's always lined up for a good shot. The bot tracks how often it actually fires the weapon, but I gave this value a lower bound of 50%. In other words, the bot assumes it will spend at least 50% of its time firing with the weapon, possibly more.

I added this assumption to deal with another problem. Bots would switch to a weapon they had never fired before, not have a perfectly lined up shot, and think... "I've spent 0 seconds firing and 100 milliseconds aiming, so I spend 0% of my time attacking. This weapon is terrible. I'm going to use something else." And they would never try out the weapon to see that they could in fact make shots with it. One problem with relying on historical data is wide variance in the initial data sets. I ran some tests with most of the weapons and found the fire rates were in the 60% to 80% range, so 50% seemed like a good bound.

The problem was that the shotgun only had a fire rate of between 30% and 50%, meaning it's hard to line up good shots with the weapon. There were situations where the bot would think it fired 50% of the time when in reality it had only attacked 30% of the time, which inflated the estimated value of the shotgun by a solid 60%. (160% of 30 is 50.) So it's no wonder they thought the weapon was good. Lowering the minimum bound to 30% had other problems though. When that happened, sometimes bots would stop using genuinely good weapons like the rocket launcher. That's a sign that a lower bound is not the correct way to solve this problem.

Related to this is the second issue:

#2) The estimation of weapon fire rate doesn't take location into account

If you're wondering why the shotgun is generally a bad weapon, it's because it's so situational. At point blank range, it can do more damage than the railgun in two thirds the time. But the spread on the weapon makes its value decrease rapidly at medium and long range. There are situations where the shotgun is good, but not many of them. In contrast, weapons like the railgun are consistently good in a wider variety of situations.

This means that a bot might shoot more often with the shotgun at point blank range than at long range because the shots are easier to line up. If the bot had a 40% fire rate with the shotgun, that might be a 50% rate when close to the enemy but only 30% when far away. Similarly, it's easy to line up shots with the rocket launcher when the target is below you, since you can just shoot at the floor. If the target is above you, you need a direct hit, and that's very hard.

If the bot knew how the firing rate could change depending on the combat situation, it would have a much better understanding of whether a weapon is a good choice against the current target. Unfortunately, the code only tracks a single firing rate for each weapon. The concept of where the enemy was located doesn't factor into the data, and that's a real issue.

The drawback of tracking fire rate data across each of the bot's 12 combat zones is that it will take much longer to get good estimates on actual fire rates. If there were problems before with bots accumulating fire rate data into one data "bucket", now that there are 12 "buckets", it will take 12 times longer before the data stabilizes. In other words, the issue that the 50% minimum fire rate was trying to address has now gotten 12 times worse.

I'm still thinking about the best way to solve this problem, but I'm leaning towards seeding the data with some reasonable estimates of how often the weapons really should be firing. If there's enough seed data, it should encourage the bot to act reasonably until it has enough of it's own data to make conclusions. That just leaves one more issue:

#3) Bots do not take into account how a good choice now could be bad later

The major drawback a situational weapon like the shotgun has is that when you use it, you give your opponent some control. They have the power to make your weapon worse just by backing up. While this is a concern for all weapons, the more situational a weapon is, the worse it is when your opponent exploits your weapon's weakness. In other words, the shotgun isn't just bad because it's only useful at close range. It's bad because your opponent can choose to be at medium or far range after you spend the time to pull out the weapon.

This is unfortunately a much harder problem to solve, since it has to do with the local level geometry. If your opponent has no escape routes, the shotgun is still excellent in close quarters. I haven't thought about this problem very much, but my intuition says that solving it in BrainWorks is well beyond the scope of the project. You might be able to analyze past reactions opponents had to your weapon choices, but it's not clear how good this data would be, how it would affect the bot's choices, and if this problem is even large enough that it needs such a sophisticated solution.

At any rate, I apologize if this post is a bit more technical than normal, but that's the nature of the work. I'm very interested to hear your ideas for how to tackle these problems. I plan on thinking through possible solutions over this upcoming week and writing the fixes next weekend. If you have thoughts on this, I'd love to hear them.

BrainWorks