Saturday, 22. April 2006

Chunkin' behavior

If there's a problem with the Zipf curve, it's that the frequency differences between the inputs become very small very fast, and thus, more and more useless for hanging any programmatic structure to. It doesn't tell me much if I happen to know that, in 1.000 conversations, the pattern "I LOVE YOU" was matched 20 times while "THAT IS THE SHIT" got 21 matches. The Ranked patterns look like a random list.

Some AI developers therefore take an approach that looks for higher-level similarities between client behaviors and carves out larger "chunks" that can be addressed programmatically. Juergen Pirner, for instance, conceptualizes groups of client inputs as "tasks", and maintains a task list. Since I work with a functional programming paradigm, what's a "task" for him is a "function call" for me, but we're handling the same phenomena.

Let me step back a little: traditionally, Information Theory assumes that the low frequency signals are associated with high information ratios, while high-frequency signals are associated with high noise ratios (redundancy, &c.). Though not many people seem to be saying much about it at this point, natural languages work somewhat differently. For example, the pattern "YES" holds Rank 2 on my list, matching about 1.4 % of the inputs. That's 2/5th of the percentage matched by Rank 1,the pattern which represents "[not recognized]" (i.e. "noise"), and it seems to be the same for masters of English/American-speaking bots everywhere.

But "yes" is nothing like "noise"; rather, it seems to be a kind of textual "meaning compressor". Depending on what was said before - the context -, the decompressed text can be infinitely varied:

"Yes." -> "I agree with you." <- "Do you agree with me?"
"Yes." -> "I do not agree with you." <- "You mean you don't agree with me?"
"Yes." -> "I want to get married." <- "Will you marry me?"
"Yes." -> "I want a divorce." <- "Will you divorce me?"

To me, this implies that I should treat the pattern "YES" as a high frequency signal which supplies a high information content, where at each signal instance, that content depends on the conversational context. My reaction to this is to declare "YES" to be a "function call", and require the function it calls to be total: by my theory, every AI output can provide the context for a "YES" input, so the system must be able to infer a meaning of "YES" as a reply to every line it can output. The fact is that typing/pasting "yes" into the input field regardless of the machine's output is one common way in which clients test the "awareness" of conversational interfaces. This suggestst to me that I need a total function here, which can assign a meaning to a "yes" input refering to every possible output the machine can generate, and return a string that reacts to that meaning as the next output.

Such a funtion might be hard to construct, but if I had one, it would cover 1.4 % of my inputs with context-relevant outputs, all in one fell swoop. Now I can even take a wider angle and say that my function should be able to take symbols as input which I judge to be eqivalent to "YES", like "THAT IS RIGHT", "FOR SURE", "CERTAINLY", &c. Those are not as frequent as yes, but they're all in the Top 1000, pushing the coverage towards, say, 1.7 %.

Let's call this group of patterns "agreement valuators" - which other "valuators" could I have? Why, "disagreement valuators", of course! If my function could also process the disagreement valuator "NO", that would add another 0.7 percent to its input coverage, resulting in 2.4 %. Adding "NOT", "WRONG", "FALSE", I'm approaching 2.8 %. "Consent/denial valuators" like "GOOD", "BAD", "COOL", and "UNCOOL" would push me well over 3 % .

Therefore, it's desirable to write program text that provides such a total function: a function which processes all those "valuators" and maps them on a "reasonable" output. But how can I make sure that the output can actually be called "reasonable" by any measure? To test this, I can make use of another obvious high frequency/high information "meaning compressor" - the pattern "WHY".

What needs to happen here is basically the reverse of what needs to happen in the "valuator" function: let's say that the client got a valuator as output from the machine. If this valuator represents actual information (i.e. in case of non-redundancy), humans almost reflexively ask for a reason behind this valuator - "Why?" (example expansion: "What was the reason for you to think that I would agree with you?"). This is why the pattern "WHY" commands Rank 4 in my list (0.35 %), after "[not recoginzed]", "YES", and "NO". Since inputting serial "why"s is another popular way to test a bot, I want the "reason" function to be total, too: for each of its outputs, the system must be able to give a reason, which has a reason . . . &c.

Though this is somewhat difficult to implement in any existing programming language, the payoff I expect is definitely an incentive for me to work very hard at it. Because a totally defined "reason" function would not only service the "why" input, but also many other inputs that I interpret as "being equivalent": "For what reason?", "How come?", "I don't think so", "I think you're wrong", &c. - I see them all as "calling the reason function". So I integrate them as recognized function call, and instead of having to muse about what I do with a certain pattern that has a 0.055 % matching probability, I just add it to the set of patterns that call "reason", the overall effect being that I push up the coverage of this function to 2 %.

All this means that, by integrating patterns which are distributed along the Zipf curve, I have a way of compressing it: in combination, the two functions I described cover about 5 % of my input space already. This is encouraging, so I'll extend it: even though I'm not likely to get the compression ratios of the top two functions when I go further down the curve, if I could find me a dozen that can integrate, say, the most frequent 5.000 patterns, that would give me like, 50 % of the coverage of the 2.000.000-million-pattern Parsimony system. So let's see: pattern "WHAT" (Rank 11) suggests a "purpose" function; pattern "WHAT IS *" (Rank 21) suggests a "definition" function . . .

Next stop: closed-world negation and partial functions.

Sunday, 25. September 2005

Breaking it down

Recent discussions in the Robitron group have prompted me to break down my personal view of the Turing Test problem into as few simple statements as possible. Here's what I came up with so far:

1. Turing's Original Imitation Game (OIG) imagines a computer program that can imitate a man that is imitating a woman - an activity that can be regarded as being a form of improvisational acting.

2. To imitate a woman, a man has to identify himself with a woman.

3. Therefore, to successfully play the Imitation Game, a computer program must be able to identify with, and thereby act as, another person - it has to be able to do what an improv actor does.

4. "Identification", the way actors understand it, means to simulate the inner states of a person from a first-person perspective (though this is most often expressed less formaly as "to step into somebody else's shoes").

5. To win, the OIG-playing computer program therefore has to succeed in simulating the inner states of a person from a first-person perspective.

6. To create a program that can simulate the inner states of a person from a first-person perspective, one would have to first come up with an exhaustive formalization of inner first-person states, represented in terms of Turing computation.

7. Despite an international research effort that now spans 55 years and involved thousands of the world's brightest minds and many billions of dollars, such a formalization is still unavailable.

8. This - to me - is evidence that such a formalization might be
impossible to create, and that the inner first-person states of
humans cannot be exhaustively formalized.

Any disagreement up to here?

Tuesday, 17. May 2005

Let 'em try to pwn you

Nathan Combs at Terra Nova on the practice of zerging:
To the casual observer, zerging may seem a form of virtualized hooliganism. Yet, it might also be seen as a natural consequence of the (arguable) first strategy (above all others) of good gamers: test the boundaries of your world, its rules, discover its patterns, and then pwn!
Which is why I think that griefing should be supported as a standard behavior in interactive stories. If gamers will do it anyway, you're better off using it to provide dramatic conflict than trying to "manage it away".

Monday, 16. May 2005

The rational roots of complete craziness

Actually I didn't start out working at bots having any crazy theories about them being method actors or something. I started out with some simple observations which lead to simple theories, most of which I think are fairly uncontroversial. The weirdo stuff just grew as I tried to write programs that covered those basics.

First of all, the average conversation between a bot and a human is an incoherent mess. The proof of this assertion is the conversation logs: such a log might have its coherent passages, but on the whole, it reads like a random bunch of text lines.

Client: That is an evasive answer.
Bot: Heraclitus said: "You don't step into the same river twice."
Client: Why?
Bot: Why not?
Client: Who is Heraclitus?
Bot: I don't know who that is. What is your eye color?

Crap dialog. But fairly typical. So the mission for me became: Figure out how to write bots that say things which result in better logs.

What do you find when you read the typical log? Here is a list of the Top 20 "atomic" (meaning: fully recognized) client inputs, as recieved by the well-known ALICE bot, and published by its author, Richard Wallace, in an essay about Zipf's Law:

8024 YES
5184 NO
2268 OK
2006 WHY
1145 BYE
946 HI
846 WHAT
663 GOOD
584 OH
544 YOU
525 COOL

The numbers represent the input frequency, indicating, for example, that input #1, YES, is about 16 times more likely to occur than input #20, THANK YOU. It's obvious that, to maintain anything resembling an "intelligent" conversation, a bot would have to respond plausibly at least to the most frequent inputs. It's also obvious that, to do that, it would have to be able to figure out what YES, WHY, WHAT mean in each case, with reference to (as a minimum) its own last output.

For AIML users, there are several ways to achieve this: either by simply using the <that/> and/or <topic/> tags provided by the language for this purpose, or by developing more general functions that use recursion to increase process intensity, thereby saving authoring time while boosting control.

But: for an AIML set that includes, say, 40,000 categories - that's about the size of the very popular AAA set -, is there anything that might allow me to assume that 2006 WHY-inputs correspond to significantly less than 2006 different intended meanings of WHY? No, there isn't. It is plausible for the client to ask WHY as a response to many more than 2006 of the outputs that this set returns. So whichever technique you use: refering to the conversation state in a systematic way, even with regards to just one input, will almost inevitably lead to the problem of state space explosion. Unless...

Unless you use self-reference, building up your content in a way examplified by this little puzzle. Doing so might put you in a position where you say things that some other people think of as complete craziness, but on the other hand, it also has its advantages. More of which later...

Tuesday, 10. May 2005

Why Interaction = Conflict?

In reaction to Different folks got different problems, Andrew Stern commented:
While I agree with Conflict = Story, I don't think Interaction = Conflict is always true. That is, I don't think players will always generative interesting conflict; without some help from dramatists (e.g. a drama manager), naive players may only generate banal, uninteresting conflicts, such as griefing or attempts to break the AI.
But where's the agency in that? Why view the player/client as "naive", and judge her actions as "banal" and "uninteresting"? This is not at all an attitude that I would suggest.

In Improv acting, this is called "blocking" (unfortunately, there's other usage of "blocking" in Method acting - "[t]he placement and movement of actors in a dramatic presentation" -; namespaces are sooo important).
If you are offered an idea by another player that you reject, ignore, or condemn, you are Blocking. The scene dies at this point and all cooperation is lost.
So as a rule, I design my bots so as not to block. A block is a bug. Go with the flow.

But of course, *stomp* the mofu who insists on playing the nasty for too long. "You know what your problem is? You can't believe that I'M THE BOSS 'round here is what your problem is, dude!" That's your drama right there. So I support Griefing as a common behavior for all my Actors, human or not.

What does this have to do with Interaction = Conflict?

I look at the Imitation Game as an application composed from four components:Game, Interaction, Conflict, Story. Conceptually, I view the arrangement as two crosswisely operating flip-flops, where one flip-flop unit's output at any time is either Game or Story, and the other one's is either Interaction or Conflict. At any time during operation, the System is in one of four states:

Game , Interaction
| Game , Conflict
| Story , Interaction
| Story , Conflict

meaning that there are always two components delivering output. Empty outputs are legal, as well as empty inputs. But the normal mode of operation is that the player/client enters some textual input, which changes the bot's state, the state change generating an output as a side effect.

As for the Interaction | Conflict pair, the purpose of the Interaction component is to minimize the effects of Conflict, and the purpose of the Conflict component is to minimize the effects of Interaction. Death is where maximum Conflict has driven the value of Interaction to zero.

Recent Comments

I feel fine.
I know someone will comment on it soon :-) Theatre...
scheuring - 14. Jun, 10:24
How do you feel when...
How do you feel when you receive no comments? How can...
Magical - 14. Jun, 09:19
Thanks, Brian,
for this interesting invitation. Since, by your own...
scheuring - 15. May, 10:33
AI-Foundation Panel
Dirk, I like the thinking. Because of that expertise,...
Brian Hoecht - 13. May, 22:05
you're welcome.
scheuring - 29. Apr, 16:29
thanks scheuring!
Cool, that seems to cover most of the basics. Definitely...
drgold - 28. Apr, 05:41
Top 400
About five years ago (pre-ProgramD), the "standard"...
scheuring - 22. Apr, 14:55


vi knallgrau GmbH

powered by Antville powered by Helma

Creative Commons License

xml version of this page
xml version of this topic AGB

Subscribe Weblog