Monday, August 25, 2008

Sitting on the Fence

For roughly the past month, I've been interviewing at a few different places looking for a new job. I've felt overqualified at my current place of employment and really wanted some more challenging work. At the end of the process I had two offers to choose from, both of which were really similar and far superior to my current job. I suppose most people might decide based on some simple, simple heuristic like "highest salary", "most attractive coworkers", or "did I flip heads?"

But I'm a compulsive optimizer. Naturally I agonized for two weeks over which offer I should. I was hoping one offer would be clearly better than the other, but both the corporate cultures and salary packages were similar. I analyzed every angle I could think of, from the commute times to little things my interviewers had said. I called friends I knew and got as much inside information about the places. I made risk estimates for the companies based on their respective industries. Talk about overkill! My opinion changed on a daily basis, but after two weeks I finally made my final decision.

This is, of course, the exact opposite of what good AI should do.

In most systems where a computer must analyze some data and make a decision, there's a problem when two choices are very similar. Sometimes the system can get caught in an oscillation state where working on either choice makes the other one better. For example, a Quake 3 bot might decide to pick up armor, but the act of moving towards the armor makes it decide that picking up some health is a better choice. So the bot instead moves towards the health, and on the way decides the armor is better. The bot can end up in a situation of indecision where it picks up neither of the two items, and clearly this is the worst of the three choices.

Item pickup isn't the only situation where this occurs. Target selection and weapon selection can have the same problems. A bot can try to chase down one target but on the way decide it would rather aim at a different one and waste valuable time aiming at neither. Or in theory a bot can switch to one weapon and change its range based on the new choice, but by the time it gets in position it decides another weapon is useful. If you've never seen a BrainWorks bot do this, that's great! I've worked hard to remove these indecisions from the AI, but it's neigh impossible to make AI that works in all situations. There are two basic tactics you can use to solve indecision oscillations, each with their own costs.

Most of the time I give the scoring estimate a bonus of between 10% and 25% for keeping the same decision. In other words, the AI won't change its mind unless the new option is at least 25% better than the old. The higher this factor is, the fewer oscillations will occur. However, the risk is that an option that's genuinely 8% better will be missed, even if it wouldn't cause an indecision loop. Additionally, it's rare but still possible that even with a 25% factor, the bot can get stuck in indecision. This method is best used for situations where the estimated value of one option over another doesn't change that rapidly. This is how general item pickup oscillations are handled, as well as enemy selection.

When a small buffer isn't sufficient, the other option is to prevent the bot from changing its mind until it completes its selection, or at least for a reasonably long duration (such as half a minute). This is guaranteed to solve the problem. But there are two drawbacks. Not only will the bot be more likely to make suboptimal decisions, but it could get caught in a situation where it's very predictable. As such, it's best to apply this tactic only when absolutely necessary and ideally when the action can be completed quickly.

This method is used in item pickup for items the bot is very nearby. If a bot can pickup an item in less than one second, it immediately overrides all other item pickups and grabs it no matter how useful some other item may be. It turns out that even with a bonus factor for keeping the same item selection, bots could still get stuck in indecision loops when they were very close to an item of low value. That's because the estimated points per second is the ratio of two very small numbers, so the estimation has a large margin of error. No bonus factor could stop the bot from switching between the bad nearby item and the good item that was far away. The code forces the bot to spend half second to grab the nearby item, thereby preventing the loop from occurring.

Regarding my new job, I was leaning towards the place that offered slightly less money but seemed to have higher job satisfaction. In the end they decided to beat the salary offered by the other company, so it was a no-brainer to pick the job that has both higher pay and happier workers.

5 comments:

Dave Mark said...

Decision momentum, my friend. Good work.

Ben said...

You state the problem very well.

What if (and this solution assumes the necessary data is available) you implemented ethernet-style exponential backoff in your decision-making threshold? For example, after making a decision (like pick up the armor) and before the action's completion, it would take an option 5% better to make the bot change his mind. If that happens, it would take an option 10% better than that to distract him again. If that happens, 20%, 40%, etc.

Eventually, by mathematical necessity, the bot will say "To hell with all this indecision! I'm going to pick something and stick with it!"

Theoretically, that approach would let the bots select the marginally better options mid-task, unless they had already wasted too much time vacillating.

Ted Vessenes said...

Ben, that's an interesting take on things. Basically you are suggesting that the constant bonus to keeping the same decision be changed to a strictly monotonically increasing function. This is clearly a superior solution, except for one problem. What function do you use?

I can compelling reasons to use linear, logarithmic, and exponential functions, and that pretty much covers the whole range of monotonic function shapes. Picking the best slope and rate of amplification is a hard problem. You end up in the "Getting it Just Right" problem I wrote about a few months ago.

I think that means this technique is great for stuff that matters a lot but should be avoided for less important things, purely in the interest of reducing development time.

By the way, Dave, I really like that phrase: "Decision momentum". Does a great job of explaining the concept. I wish I'd thought of it!

Dave Mark said...

Ben,

You realize that you are just simply going to have to buy my upcoming book, "Behavioral Mathematics for Game AI", right?

;-)

Anonymous said...

Hi, wish the download was still up!

One of the things that bugs me the most with default q3a bots is their clairvoyance in regards to pickups. They don't time items, or apparently make any judgments about how likely an item is to still be around when they get there-- nor, of course, use the fact that an item is present or absent to make any predictions about the proximity of other players/bots.

I was looking around for any posts regarding how you handled this, or if you did at all, but didn't see anything. It seems to me like it could involve some interesting reasoning.