BrainWorks

Monday, October 13, 2008

The Theory Of Fun

Last week I ended with a simple question: Does making a game more balanced make it more fun? And like so many other things in life, the answer is both simple and complicated:

It depends.

To put it briefly, it depends on the context of the game and the reasons people are playing it. While on the surface, the original Doom appears similar to Quake 3, they are polar opposite games from a game design perspective. Doom is all about mowing downs armies of generally stupid demons with your increasingly diverse stockpile of weapons. Quake 3 is about fighting a skilled opponent with access to the same weapons and powerups you can use. Doom is a visceral game while Quake 3 is tactical. Rather than saying Doom is fun, it's more accurate to say that visceral experiences are fun, and Doom happens to fill that role. But a game like Serious Sam is just as fun, and in the same way.

If games are candy for the mind, then whether or not you like a particular flavor of game is a question of personal preference. What does your mind enjoy? Each kind of thinking corresponds to a different style of game.

Analysis
Action
Discovery
Creativity
Socialization
Identity

As best I can tell, these are the six basic thought elements that make games fun. Allow me to explain them in more detail:

Analysis

Examples: Chess, Go
Requires: Discrete set of Options

People who like analytical games approach them as a puzzle to be solved. The game needs to provide decisions for the player to make, any of which could be helpful in the right situation. The fun is in selecting your decision and finding out if it was a good choice. Winning the game usually means you were right, or at least more right than your opponents. Analytical games can also be single player games, such as a Rubix Cube. They need not include perfect information to all players like Chess and Go do, of course. A game like Poker contains a lot of analysis. But the analysis is in large part about probabilities, risks, and rewards. You can win a game of poker by making the wrong play, or lose making the right play, but people who deduce the right plays more often will win more often, and that's what makes the game analytical.

Action

Examples: Golf, Football
Requires: Physical Interaction

An action game is anything that involves a physical component. If there is ever a game situation where you know what you need to do but your muscles might not be able to do it, it's an action game. The joy in an action game is seeing how well you can do what you tried to do. It's really satisfying to shoot a three point shot in basketball, score a goal in football, or hit a home run in baseball. Every physical sport is an action game by this definition. But other games have action components as well. All first person shooter games are "action games" because they require you to aim at your target and shoot them. You know what you want to do, but your muscles might fail at moving the mouse to the right place. Even a game like Jenga is an action game, because it depends on precise muscle control.

Discovery

Examples: Final Fantasy VII, World of Warcraft
Requires: Artistry

Discovery games are fun because the player experience new things. Their joy is the joy of appreciation. A common theme in these games is a story line. As a result, discovery games generally have lower replayability. They are highly engaging the first time through, however. What the player experiences doesn't have to be the story, of course. It could be enjoying the look and feel of a new World of Warcraft zone. What all discovery games have in common is artistically designed content that the player enjoys experiencing.

Creativity

Examples: Charades, Magic: The Gathering
Requires: Very large set of Options

Some people really enjoy thinking outside the box. To become a medium for expressing creativity, the rules of the game must be extremely flexible, allowing for a wide variety of possible options. The number of options is far larger than that of an analytical game: it needs to be so large that players cannot exhaustively try all of the alternatives. This gives creative people the option to express themselves by making choices other people might not even have considered. Creative games often have a verbal component to them, but this is not necessary. For example, deck design in Magic: The Gathering is a very creative endeavor. There are more legal Magic decks than atoms in the Universe! And a number of Magic players enjoy making decks more than playing with them.

Socialization

Examples: Mafia, Shadows over Camelot
Requires: Cooperation

While it's true that games are only as fun as the people you play them with, some people explicitly enjoy games because of the social interactions. In their opinion, a game's purpose is to generate social interactions, and whether you win or lose isn't as important as how you got there. These games categorically encourage players to work as a team towards a common goal. Victories and losses are shared. The fun is about being part of a group and, for some games, deducing who isn't really a member of the team from their actions.

Identity

Examples: Counterstrike, Team Fortress 2
Requires: Winners and Losers

There are some gamers who play games to win. Obviously the typical sore loser falls into this category, but that's not the only type of person who fits this description. This type of gamer considers winning an expression of who they are. The sore loser is someone who considers that expression to be a validation of identity-- they are distressed when they lose because the loss injures their ego. By definition, every game has winners and losers. But in almost all cases, gamers who want to win prefer games against human opponents, as it makes winning more meaningful. Note that these gamers need not be good players. Often bad players who want to win will gravitate towards teamplay games like Counterstrike. No matter how bad you are, you will win roughly half of your games if the teams are large enough. When a gamer wants to express themselves through winning, all they really need is a human opponent and the plausibility to claim they won through skill. Actual skill could have been involved, but it's not required.

Monday, October 6, 2008

Less Is More

Previously I wrote about a Quake 3 modification I made named Art of War, and how it was the inspiration for BrainWorks. But it wasn't the first Quake 3 mod I made. That honor belongs to an unreleased game variant simply titled Less. The concept of Less is simple: All items are less powerful, less health or ammo per pickup. It's the opposite of Excessive Quake. Excessive is fun because who wouldn't want to be uber powerful. Doesn't that mean Less is less fun to play?

I believe that when done correctly, less is more. The actual numbers on an item don't mean anything; only the relative values matter. For example, you could multiply all health, armor, and damage values in Quake by 1000 and the game play wouldn't change. Players would start with 125,000 health and each rocket would do up to 100,000 damage. Strategically nothing has changed. I call this "Pinball Inflation", as pinball games have been adding extra zeros for decades. Some tables now have scores that reach the trillions.

Online RPGs like World of Warcraft have a similar issue, where you can gain levels and deal extra damage, but you fight monsters that have more health. The difficulty of the fights don't change that much. At best you might gain a totally new kind of ability as you gain a level, which gives you another tool at your disposal, and that's the only way the difficulty increases. So for MMOs, the number inflation game is a method of unlocking more content. Players would be daunted if the first time they picked up a class of character, they had 30+ different abilities to choose from. It's better to start them with 3 to 5 and have them gradually learn more.

All that said, it's crucial to realize that gaining levels doesn't actually make your character more powerful, relative to the content you're doing. Sure you might kill goblins in 2 hits instead of 3, but you'll eventually move onto killing stronger goblins. The primary purpose of gaining levels is letting you experience more content in the game while maintaining the same difficulty.

I started the design of Less from a similar standpoint: I wanted Quake 3 to have more content. Reducing the benefit of each item might seem like a strange way to add content, but listen to the logic. A typical game of Quake 3 involves controlling the two to five best items on the map: red and yellow armor, megahealth, and powerups like quad damage and haste. But on the level there are dozens of items that rarely matter, things like boxes of ammo and weapons. You'd think weapons would matter more, but when they respawn every 5 seconds after pickup, pretty much everyone has any weapon they want. And the ammo a weapon provides makes that weapon's associated ammo box irrelevant. There may be 50 items on a level, but only 3 of them matter for a typical game.

The concept of Less was to make all items have a roughly equal play value on average. Then the best players would be those who knew which item was most important in the current situation, and what their opponent needed the most. You could end up with a game where the winner was the person who best controlled a box of rockets and a box of machinegun bullets. And that adds a deeper level of strategy and tactics, thereby adding more gameplay to the game.

You'd be suprised, but it's actually possible to tweak the numbers for each item to make this possible. The game plays like Bizzaro Quake 3, where you start timing the respawn of ammo boxes and small health balls in addition to armor and quad damage. It's definitely the same game on the surface, but the high level strategy of how you play the map is totally different.

While I don't have the source code in front of me, here are some of the changes made, to give you a sense of how dramatically the gameplay shifts.

Railgun Weapon: Provides 2 slugs (3 seconds) instead of 10 (15 seconds)
Railgun Slug Box: Provides 5 slugs (7.5 seconds) instead of 10 (15 seconds)
Lightning Weapon: Provides 24 ammo (1.2 seconds) instead of 100 (5 seconds)
Lightning Ammo: Provides 40 ammo (2 seconds) instead of 60 (3 seconds)
Quad Damage: Lasts for 8 seconds instead of 30
Red Armor: 50 armor instead of 100
Yellow Armor: 25 armor instead of 50
Orange Health: 25 health instead of 50
Yellow Health: 15 health instead of 25

When playing Less, you have this constant sense of never having enough of anything, and that in turn creates a sense of fear and tension. The big question is, though, "Do these changes make a better game?" I'm curious what other people think, and next week I'll share my own thoughts on the difference (and connection) between making a game balanced and making a game fun.

Monday, September 29, 2008

Anatomy Of The Brain

Last week I wrote in brief about the differences between conscious and non-conscious thought, and how experiments show humans use both. Even the most primitive animal brain is an extremely complicated organ, and the human brain is obviously the most advanced brain we've encountered. Conceptually there are three main sections of the human brain, corresponding to the evolutionary processes that produced the brain.

Pardon the aside, but I wanted to say a word to the Christian readers who believe in Intelligent Design instead of evolution. Feel free to interpret "evolution" with "how God chose to create life". Your personal belief about how humans came to exist won't change your interpretation of what I have to say, so don't let my choice of language get in the way. Remember, I was a Christian for 30 years. I was taught creationism by my parents. In high school I believed the theory of evolution, and just said "God used evolution to create humans". Now I just think "humans were created through evolution". I still think people who believe in Intelligent Design are wrong given the data, but I see no reason to be condescending or judgmental about it.

As I was saying, there are three main sections of the brain. At the core is the so called "Reptilian Brain", or more formally the brain stem. This section handles basic reflexive responses such as "fight or flight", mating instincts, and the fear of other species. It also handles exactly one emotion: rage. Next is the Mammalian Brain, or Limbic system. This is the area of the brain that handles all other emotions, as well as concepts like family, culture, and attachments. Some aspects of conscious thought and self-identity are handled by the mammalian brain as well. Last is the Neo-cortex, which is responsible for higher level thought such as speech, reasoning, imagination, and speculation.

While there's a clear physical boundary between the reptilian and mammalian brains, the division between mammalian (limbic) and neo-cortex is not as clear, and there seems to be a stronger bleed between the functions. For example, some conscious thoughts are handled by the mammalian brain while others are handled by the neo-cortex.

Humans have all three brain sections, as do higher mammals such as other primates and larger mammals. Small primates such as rodents do not have a neo-cortex, although they do have the limbic system and reptilian brain. And as the name implies, all reptiles have the reptilian brain, but lack the mammalian brain and neo-cortex. So the brain sections correspond to different evolutionary forks. The primary feature that separates mammals from reptiles is not hair or internal gestation, but a more advanced brain.

If that's true, then reptiles don't actually have conscious thought. They simply respond to stimuli in the same fashion every time. But mammals, having a memory, can learn from past experiences and modify their behavior. Perhaps this is how mammals survived the dinosaurs after a natural disaster hit the earth. Dinosaur brains weren't programmed to handle the "meteor crashed into the earth and all your normal food dies" situation, whereas mammals could learn to find new food sources.

In computer science terms, the reptilian brain is analogous to a state machine, or a simple circuit board. It's pure hardware, and if you give it the same set of inputs, it produces the same set of outputs every time. It has no memory. The mammalian brain is more like a simple computer program, in that its responses are based both on sensory input and on past memories and experiences. Running the same computer program multiple times might produce different results. Mammals have the ability to learn throughout their life while reptiles do not. And since the neo-cortex handles imagination, reasoning, and "what if?" scenarios, it's closer in function to an operating system. An OS can run multiple programs in parallel, similar to the neo-cortex's ability to think about multiple thoughts at the same time, even conflicting thoughts.

This biological framework provides an interesting context from which to answer the questions "What are emotions?" and "Can a computer have emotions?" Some day I'll write on that, but there's a lot more to say than should be stuffed at the end of this column. In the meantime you'll just have to speculate.

Monday, September 22, 2008

Armchair Neuroscience

I'm really grateful for my undergraduate education at the University of Chicago. I believe it's the best liberal arts school in the country, in large part because it lacks the name recognition of Harvard or Yale while still placing academically in the top ten undergraduate institutions in America. No one goes to the University of Chicago for the fame or connections they might get at Harvard, and as a result the school attracts only those students who really want to learn. At the University of Chicago, they teach you how to think and analyze everything.

The joke in college was that a University of Chicago student didn't need to know anything about a subject to have an opinion on it. This is pretty much true. I distinctly remember getting an A on a paper about The Brother's Karamazov when I had only read the first quarter of the book. The trick was to relate the concepts discussed in class with the material I had actually read and with other texts read in that class. While I'm sure my professor would have been mortified to find this out, it speaks a lot about learning to analyze abstract concepts and come to conclusions that happen to be right.

In other words, The University of Chicago taught me to bullshit well.

Most of the time I talk about a topic, I really do understand the subject matter. But I'm not willing to let a lack of expertise or study prevent me from sharing my opinion, so today's topic is neuroscience. As a disclaimer, I've never studied this subject in an official capacity in my life. All the information I have is from personal reading into other studies, most of which are (I think) peer reviewed.

There! That's a more accurate disclaimer than you'll hear from any journalist, and yet they are often no better qualified than I am to write about these topics. In particular, I recently read an article on msnbc.com that hypothesized that your brain could be controlled by an inner zombie.

Yes, that's really their conclusion. And by "conclusion" I mean "hypothesis supported by circumstantial evidence that's phrased as a question so no one can refute your claim because you're not officially making one". Since there's no challenge in ridiculing a claim like this, let me instead give a fair and impartial summary of the article. Then I'll rip it to shreds and propose a hypothesis that better fits Occam's Razor by several orders of magnitude.

I'll admit the term "inner zombie" is just a term chosen to generate interest in the article. And in some sense it worked-- I'm linking to it. All they really mean by inner zombie is a thought process that's not conscious and not (always) connected to the rational decision making. The article cites some studies where people are blinded by deactivating the visual processing section of their brain, and then shown a word. The people have no conscious knowledge of the word, but when they are asked to "guess" what word might have been shown, they properly guess the word with much higher than expected accuracy. There's also a study that shows people who are temporary blinded can still reflexively react to visual information they can't see. And while the article doesn't mention it, there's also a study done on people who are blind because their brain doesn't process images even though their eyes are fine. Compared to other blind people, these people do a much better job than of dodging obstacles while walking down the street, like other people or sign posts.

I'd like to note that these results are all from real, peer reviewed studies. Scientists aren't questioning the validity of these results. But they also aren't jumping to the same conclusion that MSNBC did. Here's the basic argument the article makes:

People show the ability to reflexively process and react to visual information even when they consciously report blindness.
Therefore an unconscious process has control over their conscious minds.
Perhaps all consciousness is controlled by these unconscious processes and consciousness is a myth.

It's step 3 that doesn't make any sense. Sure, humans have unconscious processes that handle the same information conscious sections of the brain handle. But why does that mean the whole brain is controlled by the unconscious part? Given our daily experience, I'd suggest that the reverse is true. Our general actions are primarily controlled by conscious processes, and only when consciousness shuts off do the unconscious sections take control. There's no reason to suggest consciousness is a myth and a "zombie" is "controlling your brain".

But the brain is a really strange organ. The reasons the brain work this way are rooted in the differences between human brains, standard mammal brains, and the reptilian brain. Next week I'll talk more about these things and explain why I think dogs have consciousness but lizards do not.

Sunday, September 14, 2008

The Last Place You Look

When I was growing up, I always found it ironic how lost things were always found in the last place you looked for them. For some reason, whenever I was looking for a lost toy it was never in my bedroom or the play room. I'd find it outside underneath the deck, in the basement, or inside the sofa. It wasn't until I was older that I realized this wasn't coincidence; it was a mathematical truth. Put another way, when you're looking for something and you've found it, you stop looking. Things are guaranteed to be in the last place you look.

That's not the same as saying all places are equally likely to hold the lost object, of course. If I'm looking for the TV remote, it's less probable that the dog took it as a chew toy than that the remote fell between sofa cushions. And it's far less probable that the remote is now part of an exhibit in the Smithsonian Institute.

The general purpose search algorithm that humans use for locating lost objects involves prioritizing possible locations from most to least likely and searching them roughly in that order. The actual order will get rearranged to save travel time. For example, if an object is most likely to be on top of your dresser, second most likely to be on the bathroom counter, and least likely to be under the bed, you'll still probably check your dresser first and under the bed second before heading to the bathroom. But in general the search is from most likely to least likely. That also explains why the place you find an object is always the last place you'd think to look. The optimal searching algorithm stipulates that you search in the least likely places last.

So this brings me back to last week's topic of octrees. I described how to build the structure that BrainWorks uses to determine which item a player is nearest. But I haven't actually explained how BrainWorks uses the octree to do this. Consider this two dimensional tree (a quadtree):

Each item in the tree is one of the colored dots. The player is the black dot with the circles around it. Every rectangle refers to a particular subsection of the tree. In this example, the red dot is the root node of the tree. The lower left section of the tree is divided by the brown dot into four sub-regions, and the brown dot's lower left region is further subdivided because it contains another item (the purple dot). The circles represent the boundary of the potential search space as the algorithm progresses.

When the search algorithm first starts, it checks the distance from the root node to the player. This distance defines a circle around the player. If an item is closer to the player than the root node is, it must be inside that red circle. At this point the algorithm doesn't know that the green or blue dots are inside the circle. It doesn't even know that no items in the lower left quadrant are closer than the red (root) item is. But it can mathematically guarantee that nothing in the upper right quadrant could be closer, so that quadrant is ignored.

Because the player is in the lower left quadrant, the algorithm prioritizes search in that section first. This is the quadrant that contains the greatest area of the red circle. The next most likely place to look is the lower right quadrant, as the player is closer to the lower right quadrant than the upper left. The upper left quadrant is scheduled third. So the areas to check are, in order:

Lower Left
Lower Right
Upper Left

Then the algorithm merely recurses. First it checks the brown item in the lower left. A quick distance check shows the brown item is too far away, but the search continues on the brown region's subquadrants of upper right, then upper left, then lower right. As none of these regions have any items, the search of the brown region terminates quickly. And at this point in time, the red item is still the closest object to the player.

When the algorithm starts checking the lower right quadrant of the red item, it notices the blue item is closer. So from now on, it compares all potential search areas against the blue circle instead of the red one. No further items exist in the lower right quadrant, so the blue item remains the closest item to the player.

Last, the algorithm wants to check the region to the upper left of the red (root) item. However, when compared to the new blue circle, the search algorithm realizes that the region does not intersect the circle. Since the circle is the area of potentially closer items, nothing in the upper left can be closer that the blue item, so the entire upper left region is ignored. And since all regions were prioritized by distance to the player (from closest to farthest), the search process is done. All potential items have been checked, so the blue item is the closest. The algorithm doesn't even need to check how close the green item is to the player because that entire upper left quadrant was skipped.

The octrees in BrainWorks function the same way, only in three dimensions instead of two, and using spheres instead of circles. That answers last week's question of why the algorithm might need to search up to 7 subregions to find the optimal solution. The closest item isn't necessarily in the same section of the octree as the player, but the diametrically opposed region can always be ignored. That leaves 7 of the 8 regions to check. In practice most regions are quickly pruned away when a closer item was found, in the same way the region containing the green item was ignored when the blue item was discovered. Even though the example tree contained seven items, only three of them needed to be checked (Red, Brown, then Blue) before the closest item could be found.

Monday, September 8, 2008

Dude, Where's My Rocket Launcher?

Back in March I wrote about how the AI conceptualizes item placement. Rather than thinking of position in terms of raw coordinates, bots consider each item (or cluster of nearby items) to be a potential landmark. When statistical tracking is done for areas players are likely to be, for example, they are grouped by which item they are closest to. Statistical tracking is one of the most powerful tools available to AI if you have a sufficiently large and well organized data set. BrainWorks uses this data to analyze which areas of the level a player is most likely to be in, for example. Not only can this information help bots track down enemies; it also helps a hurt bot them avoid enemies. The more data it can track, the better. So by extension, the faster it can translate from raw coordinates to the nearest item, the better. However, this is not a simple request.

If a level has N items in it and a player is located at position (X, Y, Z), what's the fastest way to determine which item is closest to that player?

Well you can just check the distance to every item and use whichever item is closest. If you read last week's column, you'll remember that this search requires linear time. If you double the number of items on the level, it will take twice as long to find the closest. Certainly we can do better than linear time.

There are a number of data structures that can solve this problem. Some of them might require more setup time but give slightly faster search time. The structure I chose is an octree. It's a tree that holds some number of three dimensional locations. If you want to learn more, I recommend the Wikipedia article on quadtrees as well. A quadtree is the two dimensional analog of an octree, so it's a bit easier to visualize.

In the example I gave last week of a filing cabinet holding 100 pieces of paper, 1 cabinet held 10 folders and each folder held 10 papers. This is a total of 111 pieces of data stored (100 papers + 10 folders + 1 cabinet). In computer science terms, each data storage location in a tree is called a "node". The node where searching begins is called the "root node". In the filing cabinet example, the filing cabinet itself is the root node. The cabinet and folder nodes split 10 ways, and the pieces of paper do not contain any sub-nodes. Nodes that do not further branch are called "leaf nodes".

Getting back to the octree example, the BrainWorks AI code needs to store the spacial locations of 50 or even 200 items in a tree. Each node in the octree contains is one of the stored points, and the node branches 8 ways. Visually think of each node as dividing space into 8 octants-- upper north west, upper north east, upper south west, upper south east, and four more counterparts for lower sections. All points that are above, north, and west of the root node are stored underneath the upper north west node, and so on for all other octants.

For example, suppose there are 50 items on a level. One of them is stored in the root-- say it's a rocket launcher. Of the 49 remaining items, 6 of these item locations are above, north, and west of the rocket launcher. The root node will link to a node for one of these six items-- a megahealth. The other five items will be stored undernearth that megahealth.

Again, the octant division property applies to all nodes, not just the root. So the megahealth also divides its space into eight octants, and the five remaining items will be appropriately placed underneath it. Keep in mind that each of these items is guaranteed to be above, north, and west of the rocket launcher. But items that are above, north, and west of the megahealth will be stored in a different octant than those above, north, and east of the health. Given that there are eight possible areas and only five remaining items to store, chances are that these items will all be stored in one of the eight sub-nodes of the megahealth. (If two items happen to both be below, north, and east of the megahealth, then the first will be stored under the megahealth, and the second will be stored under the first.)

Searching a tree is generally proprortional to the depth of the tree-- the greatest distance from the root node to a leaf node. And for a tree of 50 items, the depth is usually 3 or 4. Going up to 100 items might push the depth to 5 for some level layouts. So the octree structure makes answering the question of "which item am I nearest?" much, much faster than checking all 50 or 100 items.

Given that items have fixed locations on the level, the octree needs to be created once when the level starts, but the tree will never modify during the game. (Ignore items that drop from dead players-- those are handled differently and not useful for statistical tracking anyway.) The actual algorithm is pretty simple, a bit slow, and very likely to produce a well balanced tree. Here's how it works.

Given a set of items:

Average together all their positions to find the center
Check each item to find which one is closest to the center
Add that item as the dividing node
Separate all remaining items into one of the eight octants this center divides
Repeat the process on each octant that contains at least one item

By "not terribly fast", this algorithm could take up to half a second on a very, very large map. So still inconsequential, and well worth the one-time hit to produce a better tree.

Of course, there's still the question of how you search the tree to find the nearest item, and that I will be explaining next week. Here's some food for thought though: even if you identify which octant a point would reside in, you still need to search up to seven of the eight octants to guarantee you've found the closest item to your input point. Searching this tree is not done like most tree searches. For those of you who like thinking through these kinds of problems, try to figure out how the search algorithm actually works. I'll give the answer next week.

Given a set of item locations arranged in an octree and one player location, describe an algorithm to find the item nearest the player.

Good luck!

Monday, September 1, 2008

Needle In A Haystack

For a long time, I kept all my important documents in a single box. That would be stuff like my passport, car title, insurance information, and so on. After my wife and I bought a home, we had enough papers to file that our simple box wasn't sufficient. We bought large filing cabinet and it's served our needs ever since.

It's not that we needed to look up stuff any more frequently than before. But when we went from 100 pieces of paper to 1000, we didn't want it to take 10 times longer to find anything. To find stuff in the box, I riffled through it until I found the paper I needed, sometimes searching the entire thing. More stuff means the search takes longer.

A filing cabinet solves this problem by providing a natural structure for organizing the papers. For example, all the papers on the home can go in one file folder, all the car information in another, and all the identification documents like passports in a third. Then the lookup process becomes simpler, as I can ignore everything that's not part of the folder that keeps whatever I'm looking for.

This is a real savings in time. Imagine you have 100 different pieces of paper. You can either stuff them all in one box or you can organize them into 10 folders, each containing 10 papers. What's the average number of papers you'll have to look at to find any given piece, assuming you always know what folder should contain that piece of paper?

When looking through the box, you could look at anywhere between 1 and 100 pieces with equal probability. So you'll average (1 + 100) / 2 = 50.5 pieces before you find it. When looking through the folders, you have to find the correct folder from 10 options and then the correct paper of the 10 pieces in that folder. It will take between 1 and 10 tries to find the folder, or 5.5 on average. And similarly it takes on average 5.5 tries to find the paper in the folder. That's 11 tries total for the folder compared to 50.5 for the box. The filing system can be searched roughly five times faster!

Or is it really five times faster? What happens if you have 1000 pieces of paper instead of 100? When you have this much information, you could create folders and subfolders to further organize the papers. If you have 10 main folders, each of which contain 10 subfolders, and each of those contain 10 papers, then it will take 5.5 + 5.5 + 5.5 = 16.5 tries to find any one of the 1000 papers. It would take 500.5 tries if they were in a box. So in this case the filing system is about 30 times faster.

The more data there is to store, the more efficient the filing system is compared to the box. That's because the filing system is a whole order of magnitude more efficient. In mathematical terms, searching through the box takes linear time. When you add ten times as much stuff, it takes ten times as long. But searching the filing system takes logarithmic time. Adding ten times as much stuff just means it takes one more search step. Very roughly speaking, searching is proportionate to the number of zeros on the size of the data. Searching 100 things takes twice as long as 10 things, and searching 1000 things takes three times long. Want to search 1,000,000 things? That's six times longer than searching 10 things.

You might be wondering what all of this has to do with artificial intelligence. At the heart of almost all AI is data, and data doesn't mean anything if you can't find it when you need it. That means efficient data storage is the required foundation for artificial intelligence. An algorithm is only as good as the data it has access to, and good AI needs a lot of data. If all the data were just shoved in a box, there's no way the computer could run the AI's thought process fast enough. Indeed in almost all computer science, the way in which data is organized is just as important as the actual data being stored. After all, what good is information if you can't find it? Next week I'll give a concrete example of how BrainWorks puts this concept to use.