The rational roots of complete craziness
Actually I didn't start out working at bots having any crazy theories about them being method actors or something. I started out with some simple observations which lead to simple theories, most of which I think are fairly uncontroversial. The weirdo stuff just grew as I tried to write programs that covered those basics.
First of all, the average conversation between a bot and a human is an incoherent mess. The proof of this assertion is the conversation logs: such a log might have its coherent passages, but on the whole, it reads like a random bunch of text lines.
Client: That is an evasive answer.
Bot: Heraclitus said: "You don't step into the same river twice."
Client: Why?
Bot: Why not?
Client: Who is Heraclitus?
Bot: I don't know who that is. What is your eye color?
Crap dialog. But fairly typical. So the mission for me became: Figure out how to write bots that say things which result in better logs.
What do you find when you read the typical log? Here is a list of the Top 20 "atomic" (meaning: fully recognized) client inputs, as recieved by the well-known ALICE bot, and published by its author, Richard Wallace, in an essay about Zipf's Law:
8024 YES
5184 NO
2268 OK
2006 WHY
1145 BYE
1101 HOW OLD ARE YOU
946 HI
934 HOW ARE YOU
846 WHAT
840 HELLO
663 GOOD
645 WHY NOT
584 OH
553 REALLY
544 YOU
531 WHAT IS YOUR NAME
525 COOL
516 I DO NOT KNOW
488 FUCK YOU
486 THANK YOU
The numbers represent the input frequency, indicating, for example, that input #1, YES, is about 16 times more likely to occur than input #20, THANK YOU. It's obvious that, to maintain anything resembling an "intelligent" conversation, a bot would have to respond plausibly at least to the most frequent inputs. It's also obvious that, to do that, it would have to be able to figure out what YES, WHY, WHAT mean in each case, with reference to (as a minimum) its own last output.
For AIML users, there are several ways to achieve this: either by simply using the
But: for an AIML set that includes, say, 40,000 categories - that's about the size of the very popular AAA set -, is there anything that might allow me to assume that 2006 WHY-inputs correspond to significantly less than 2006 different intended meanings of WHY? No, there isn't. It is plausible for the client to ask WHY as a response to many more than 2006 of the outputs that this set returns. So whichever technique you use: refering to the conversation state in a systematic way, even with regards to just one input, will almost inevitably lead to the problem of state space explosion. Unless...
Unless you use self-reference, building up your content in a way examplified by this little puzzle. Doing so might put you in a position where you say things that some other people think of as complete craziness, but on the other hand, it also has its advantages. More of which later...
First of all, the average conversation between a bot and a human is an incoherent mess. The proof of this assertion is the conversation logs: such a log might have its coherent passages, but on the whole, it reads like a random bunch of text lines.
Client: That is an evasive answer.
Bot: Heraclitus said: "You don't step into the same river twice."
Client: Why?
Bot: Why not?
Client: Who is Heraclitus?
Bot: I don't know who that is. What is your eye color?
Crap dialog. But fairly typical. So the mission for me became: Figure out how to write bots that say things which result in better logs.
What do you find when you read the typical log? Here is a list of the Top 20 "atomic" (meaning: fully recognized) client inputs, as recieved by the well-known ALICE bot, and published by its author, Richard Wallace, in an essay about Zipf's Law:
8024 YES
5184 NO
2268 OK
2006 WHY
1145 BYE
1101 HOW OLD ARE YOU
946 HI
934 HOW ARE YOU
846 WHAT
840 HELLO
663 GOOD
645 WHY NOT
584 OH
553 REALLY
544 YOU
531 WHAT IS YOUR NAME
525 COOL
516 I DO NOT KNOW
488 FUCK YOU
486 THANK YOU
The numbers represent the input frequency, indicating, for example, that input #1, YES, is about 16 times more likely to occur than input #20, THANK YOU. It's obvious that, to maintain anything resembling an "intelligent" conversation, a bot would have to respond plausibly at least to the most frequent inputs. It's also obvious that, to do that, it would have to be able to figure out what YES, WHY, WHAT mean in each case, with reference to (as a minimum) its own last output.
For AIML users, there are several ways to achieve this: either by simply using the
<that/>
and/or <topic/>
tags provided by the language for this purpose, or by developing more general functions that use recursion to increase process intensity, thereby saving authoring time while boosting control. But: for an AIML set that includes, say, 40,000 categories - that's about the size of the very popular AAA set -, is there anything that might allow me to assume that 2006 WHY-inputs correspond to significantly less than 2006 different intended meanings of WHY? No, there isn't. It is plausible for the client to ask WHY as a response to many more than 2006 of the outputs that this set returns. So whichever technique you use: refering to the conversation state in a systematic way, even with regards to just one input, will almost inevitably lead to the problem of state space explosion. Unless...
Unless you use self-reference, building up your content in a way examplified by this little puzzle. Doing so might put you in a position where you say things that some other people think of as complete craziness, but on the other hand, it also has its advantages. More of which later...
scheuring - 16. May, 18:23
Re: on being highly responsive
definitely.
So whichever technique you use: refering to the conversation state in a systematic way, even with regards to just one input, will almost inevitably lead to the problem of state space explosion.
Not if you can believably map those thousands (millions) of valid player inputs into a small but rich set of responses, with enough generativity built in to glue it all together.
If you have motivated characters with dramatic goals they are pursuing (as opposed to aimless chatting), they have things they want to do and say, the range of which is necessarily focused and finite. So their task is to take what the player is saying, interpret it in some way that relates to their own dramatic goals (e.g., is the player disagreeing with me? being helpful? provoking me?) and then choose the best response they have, that continues towards their dramatic goals.
That is, they don't need to directly answer "why" to all the combinations; they can interpret the player's question as a general provocation, for example.
Although focused towards dramatic goals, ideally the characters have very rich collections of ways (behaviors) to get there, including some ability to generatively tweak/modify their dialog to glue it at least a bit to the particulars of what tangent the player may be going off on. Even more ideally, the dialog itself is written generatively, allowing that much more richness; the dramatic focus of the situation may keep that generativity task tractable.
Successful open-ended chat, I think, is a bigger, harder problem than creating successful dramatically focused conversation. The state space of the latter, while still huge, is smaller and more feasible to implement successfully.
(btw, the fact that "aimless" begins with AIML is purely coincidental. no offense intended, really. sorry, bad joke.)
Re: on being highly responsive
One way to approach this issue is to locate that "centre of gravity" between two characters, which represent two opposing "world views" with regards to the how to solve a particular problem. Let's refer to them as the
Main Character
and theObstacle Character
, and, as a pair, as theSubjective Character
s. In Star Wars Episode IV, for instance (an example I use because the story is so widely known, and works very well as a Grand Argument Story), theSubjective Character
s are Luke Skywalker and Obi Wan Kenobi. The outcome of theirSubjective Story
(will the young hotspur finally adopt that old hermit's faith, stop thrashing about, and trust the Force?) determines the outcome of theObjective Story
(will the Death Star be destroyed, and the Rebels be saved?).From what I can tell (I haven't had the chance to play the game yet, but I've read the papers), the way you set up the drama in Façade casts Grace and Trip as the
Subjective Character
s. If I understand your story structure correctly (please tell me if I'm mistaken), yourObjective Story
is about this couple (in their function asObjective Character
s) having a visitor (the player, as anotherObjective Character
) at their house for drinks, while theSubjective Story
is about the couple's relationship being on the rocks. This would be quite similar to the basic structure of Who's Afraid of Virginia Woolf, minus oneObjective Character
, the dramatic functions of which could be handed over to the other characters (for those who are interested in the details of how "Virginia Woolf" works as a drama: I've checked, and there's a decent analysis of it included with the demo of the Dramatica software, in the "Examples" folder).So as a dramatic setup, this should work. However, I believe that casting the player as an
Objective Character
, and having her participate only in the "dispassionate argument" of theObjective Story
, is a choice rather than a necessity. An alternative would be to assign either theMain Character
or theObstacle Character
function to the player as well, so that the "passionate argument" of theSubjective Story
would be raised between a human and a virtual actor, rather than between two virtual ones. That is how I approach it, which might, at least in part, explain my different take on the matter.First of all, for the purpose of creation, I assume that "aimless chat" doesn't exist. Whether this is true or not on scientific grounds is not important to me; what's important is that the bot character is designed so as to interpret any input as if it was backed by some intention, or goal. Whether a particular interpretation is "correct" from the point of view of the player/client who originates the input isn't important, either; what's important is that the individual results that are brought about by a string of such interpretations during a conversation accumulate to form the coherent picture of an "intelligent" character. No matter the sequence in which the outputs are elicited, they should always relate to the input as well as to one another and, as a whole, "add up" to something which can be called "characteristic". The net effect I'm trying to have on the client is that, after a while, the character should appear to her, so that she might think something like: "Well, this is not how I see things, but yeah, if you look at it that way, it sure makes sense". If I ever manage to create something that causes clients to react in that fashion, then I will feel that I've created something that deserves to be called an "interactive character" as I define it.
To get there, I'm doing what I think is the opposite of what most other people who work in AI are doing. Most seem to think in terms of "huge": huge state spaces, huge databases, etc. I think "small". More precisely, and taking a cue from Guy Steele, I think in terms of "growing a language". This language starts out with a single output sentence. Lacking a better idea, I used to think of this as the Gödelian sentence for quite a while, always knowing that this would get me in trouble with the hardcore math geeks if I ever mentioned it to them. Recently, I got lucky, since Aubrey came up with a much better term to steal: Edge Metaphor - "a [plausible] boundary to possibility" (thanks, Aubrey). Building on this
Edge Metaphor
, the language (actually the story) is grown in terms of itself.As usual (meaning that I also "invented" several technical things which I later found out had long-standing theoretical coverage), I only discovered the already existent theoretical "padding" for this practice as I went along, but I now think that what I'm doing is akin to Wittgenstein's language games, in particular the one that Jean-Francois Lyotard has dubbed "the prescriptive game", where the emphasis is on social values rather than factual truth. This may sound mighty theoretical, but the bot I'm currently working on is not designed to reveal the theory behind itself, but will have a quite pragmatic view of the world.I think that, theoretically, a bot that would argue for Wittgenstein's theory while being based on it could actually be built, and that that could be fun, but it would result in a very different character, and be a much larger work than the one I'm trying to finish now is. Instead, I picked a relatively short "source text" from the Internet that the bot refers to, to prove that one can take an arbitrary linear text and make it interactive.
I dont't want to reveal the details here, since that would spoil the fun that I hope you'll have playing with the program once its ready. Suffice it to say that the idea is that you can ask "what", "when", "where", "how", "who", "why", and "what is that" about anything the bot says (I'll be cheating a bit, to save production time, but the bot will know it when it does it, and also why it does it). Of course, you can also say anything else you want to say, and the bot will interpret that and makes sense (as long as there's no bug - but it should know that it might have bugs, too...), and can say why he says that... and so on. This is possible because all the content eventually leads you to the
Edge Metaphor
... and once you reach that, it fans out again from there!The reason why I came across this method and chose it over working with "dialog acts" and similar abstractions is threefold: 1. such abstractions usually discard a lot of the information in the input in favor of an approximation that often doesn't really fit the bill, 2. clients often use bots as if they were search engines, typing in single words, which are hard to rate as "dialog acts" or something, so I found it better to let the word be my basic computational object, and 3. in the absence of a general Theory of Causality, I couldn't find any other way to really ground any reasoning (i.e. answer a potentially infinite string of consecutive "why"s). Sure, the purists of logic will rightfully accuse my bots of "circular reasoning", but hey, I never claimed my method would require no trade-offs! Plus, Wittgenstein seems to have thought circular reasoning was cool, too, and I hear he was "one of the foremost philosophers of the 20th century". So there.
(btw, there's a prominent member of the AIML community only known to us as Doubly Aimless ;-)