Robot Soul

Wednesday, 14. June 2006

He thinks so, too

The theory of computability is really the mathematics of the natural numbers and finite mathematical induction.

(R. P. Loui, "Some Philosophical Reflections on The Foundations of Computing", 1998)

Finally, I've found ~~a mathematician~~ an engineer who seems to see what I see. Since natural numbers and finite mathematical induction don't represent a very powerful toolbox for "intelligence" to pull from, most researchers and developers in AI don't want to accept them as their limit. For more than 50 years now, they've tried to find new tools. Were any new tools found? No - now, as then, computation is just rule-following, and any program you can write in Ruby on Rails, you can write in Assembler. Have people stopped trying? No.

But ultimately, they will. And when that happens, and people start getting creative within that limit, this whole AI thing will get so much more interesting ;-)

scheuring - 14. Jun, 10:10

0 comments - add comment

Tuesday, 13. June 2006

Duh!

You know something? Optimal simulation of storytelling is NP-hard for Grand Argument Stories.

Proof: A GAS can be encoded as an extended regex (a regular expression that includes backreferences to its own states previously in the processing). The GRAPH 3-COLORABILITY problem, which is known to be NP-hard, can be reduced to regex matching with backreferences. Thus, a GAS is equivalent to GRAPH 3-COLORABILITY in computational complexity: NP-hardness (at least).

Therefore, given some interactive storytelling system that refers to a GAS structure, finding compression algorithms that increase the system's storytelling efficiency - save some computational resources, particulary at runtime, by reusing objects - is possible, but none can be found that can compress the informational substrate, namely, the foundational Character Elements and their quad-wise interplay. Storytelling effectiveness - which I measure as "number of pleasant surprises per player per session" - can only ever be increased by human authors. Everything that's not reused - i.e., all "information" in the sense of (Algorithmic) Information Theory - needs to be thought-out and written before program execution, if the GAS structure is to be preserved during the interaction

Okay then. At least I know what's up.

scheuring - 13. Jun, 11:23

2 comments - add comment

Sunday, 11. June 2006

Bots as newbie role-players

Good role-players stay in character when on-stage. Newbies generally have limited ability to respond; their conversation armamentarium is small. [Second Life, F, 57]

Via Terra Nova, I found that quote in "The Protocols of Role-Playing", another fresh publication by The Daedalus Project. It's about trying to understand role-playing by asking role-players to describe what counts as good role-playing and what the etiquette of role-playing is. Since the bots I know generally have limited ability to respond, too, and their conversation armamentarium is also small, I wonder what the idea of casting a bot as a newbie roleplayer might lead to. The article goes on to say: "A good role-player is not only consistent, but draws from a coherent character story or psychology to react to a wide range of scenarios."

This sounds like a high-level requirement for a generalized bot to me. I think there are several other useful hints in there:

Don't be a drama queen (a.k.a. "attention hog").
React so as to accomodate other characters and their play.
Develop your character over time (this relates to Simon Laven's "countinuous beta testing" pattern).
Mind that your characters way of speaking/spelling strongly influences its image in the minds of other players.
Don't act like you're forcing your character's personality upon others (the short form of this rule is: "Don't God-Mode" - catchy).
Don't let your character say things it couldn't possibly know at its current point of development.

The man behind The Daedalus Project, Nick Yee, specializes in online research surveys of players in immersive online environments. He has collected over 20,000 surveys from about 4,000 individual respondents, and publishes his findings online. Way cool.

scheuring - 11. Jun, 12:44

0 comments - add comment

Wednesday, 10. May 2006

developer := 'Mort' | 'Elvis' | 'Einstein'

Due to some blogging Microsoft employees and MVPs who disapprove of the practice,I know now that MS uses an internal classification scheme of programmer personalities when developing programming
languages and tools. A software developer, MS usability folks reckon, will be either a Mort, an Elvis, or an Einstein.

"Mort, the opportunistic developer, likes to create quick-working
solutions for immediate problems and focuses on productivity and learns
as needed. Elvis, the pragmatic programmer, likes to create
long-lasting solutions addressing the problem domain, and learn while
working on the solution. Einstein, the paranoid programmer, likes to
create the most efficient solution to a given problem, and typically
learn in advance before working on the solution. In a way, these
personas have helped guide the design of features during the Whidbey
product cycle."

So as far as Microsoft is concerned, I'm Elvis. Which rocks, of course :-)

The scheme is a bit on the coarse-grained side for my liking. I love reducing the number of parameters as much as the next guy, but for bots, any character model using less than a five-way categorization scheme seems to allow for too little behavioral discrimination to be useful enough. However, I can see its worth as a communication tool between MS employees.

Let's try recursive application: there's no reason why any developer who develops programming languages for other developers while thinking of developers as the set (Mort, Elvis, Einstein) should not also be either a Mort, an Elvis, or an Einstein. Programs are media; programmer's personalities influence program usage; it's turtles all the way to the ground. If, like Richard Wallace, you deliberately encode parts of your personality in your bot's character, those parts can end up being reused by thousands of ALICE clones.

Just like actors, directors, and writers, software developers start by being spectators, and are always the first spectators of their own work. And we all should know which audiences we could be part of as spectators, because those are the audiences we might be able to work. AI developers will have to learn what it means to work an audience. So I should probably ignore Mort and Einstein for now, and concentrate on being Elvis.

scheuring - 10. May, 14:36

0 comments - add comment

Saturday, 22. April 2006

Chunkin' behavior

If there's a problem with the Zipf curve, it's that the frequency differences between the inputs become very small very fast, and thus, more and more useless for hanging any programmatic structure to. It doesn't tell me much if I happen to know that, in 1.000 conversations, the pattern "I LOVE YOU" was matched 20 times while "THAT IS THE SHIT" got 21 matches. The Ranked patterns look like a random list.

Some AI developers therefore take an approach that looks for higher-level similarities between client behaviors and carves out larger "chunks" that can be addressed programmatically. Juergen Pirner, for instance, conceptualizes groups of client inputs as "tasks", and maintains a task list. Since I work with a functional programming paradigm, what's a "task" for him is a "function call" for me, but we're handling the same phenomena.

Let me step back a little: traditionally, Information Theory assumes that the low frequency signals are associated with high information ratios, while high-frequency signals are associated with high noise ratios (redundancy, &c.). Though not many people seem to be saying much about it at this point, natural languages work somewhat differently. For example, the pattern "YES" holds Rank 2 on my list, matching about 1.4 % of the inputs. That's 2/5th of the percentage matched by Rank 1,the pattern which represents "[not recognized]" (i.e. "noise"), and it seems to be the same for masters of English/American-speaking bots everywhere.

But "yes" is nothing like "noise"; rather, it seems to be a kind of textual "meaning compressor". Depending on what was said before - the context -, the decompressed text can be infinitely varied:

"Yes." -> "I agree with you." <- "Do you agree with me?"
"Yes." -> "I do not agree with you." <- "You mean you don't agree with me?"
"Yes." -> "I want to get married." <- "Will you marry me?"
"Yes." -> "I want a divorce." <- "Will you divorce me?"
.
.
.

To me, this implies that I should treat the pattern "YES" as a high frequency signal which supplies a high information content, where at each signal instance, that content depends on the conversational context. My reaction to this is to declare "YES" to be a "function call", and require the function it calls to be total: by my theory, every AI output can provide the context for a "YES" input, so the system must be able to infer a meaning of "YES" as a reply to every line it can output. The fact is that typing/pasting "yes" into the input field regardless of the machine's output is one common way in which clients test the "awareness" of conversational interfaces. This suggestst to me that I need a total function here, which can assign a meaning to a "yes" input refering to every possible output the machine can generate, and return a string that reacts to that meaning as the next output.

Such a funtion might be hard to construct, but if I had one, it would cover 1.4 % of my inputs with context-relevant outputs, all in one fell swoop. Now I can even take a wider angle and say that my function should be able to take symbols as input which I judge to be eqivalent to "YES", like "THAT IS RIGHT", "FOR SURE", "CERTAINLY", &c. Those are not as frequent as yes, but they're all in the Top 1000, pushing the coverage towards, say, 1.7 %.

Let's call this group of patterns "agreement valuators" - which other "valuators" could I have? Why, "disagreement valuators", of course! If my function could also process the disagreement valuator "NO", that would add another 0.7 percent to its input coverage, resulting in 2.4 %. Adding "NOT", "WRONG", "FALSE", I'm approaching 2.8 %. "Consent/denial valuators" like "GOOD", "BAD", "COOL", and "UNCOOL" would push me well over 3 % .

Therefore, it's desirable to write program text that provides such a total function: a function which processes all those "valuators" and maps them on a "reasonable" output. But how can I make sure that the output can actually be called "reasonable" by any measure? To test this, I can make use of another obvious high frequency/high information "meaning compressor" - the pattern "WHY".

What needs to happen here is basically the reverse of what needs to happen in the "valuator" function: let's say that the client got a valuator as output from the machine. If this valuator represents actual information (i.e. in case of non-redundancy), humans almost reflexively ask for a reason behind this valuator - "Why?" (example expansion: "What was the reason for you to think that I would agree with you?"). This is why the pattern "WHY" commands Rank 4 in my list (0.35 %), after "[not recoginzed]", "YES", and "NO". Since inputting serial "why"s is another popular way to test a bot, I want the "reason" function to be total, too: for each of its outputs, the system must be able to give a reason, which has a reason . . . &c.

Though this is somewhat difficult to implement in any existing programming language, the payoff I expect is definitely an incentive for me to work very hard at it. Because a totally defined "reason" function would not only service the "why" input, but also many other inputs that I interpret as "being equivalent": "For what reason?", "How come?", "I don't think so", "I think you're wrong", &c. - I see them all as "calling the reason function". So I integrate them as recognized function call, and instead of having to muse about what I do with a certain pattern that has a 0.055 % matching probability, I just add it to the set of patterns that call "reason", the overall effect being that I push up the coverage of this function to 2 %.

All this means that, by integrating patterns which are distributed along the Zipf curve, I have a way of compressing it: in combination, the two functions I described cover about 5 % of my input space already. This is encouraging, so I'll extend it: even though I'm not likely to get the compression ratios of the top two functions when I go further down the curve, if I could find me a dozen that can integrate, say, the most frequent 5.000 patterns, that would give me like, 50 % of the coverage of the 2.000.000-million-pattern Parsimony system. So let's see: pattern "WHAT" (Rank 11) suggests a "purpose" function; pattern "WHAT IS *" (Rank 21) suggests a "definition" function . . .

Next stop: closed-world negation and partial functions.

scheuring - 22. Apr, 14:09

2 comments - add comment

older stories