The rational roots of complete craziness
Actually I didn't start out working at bots having any crazy theories about them being method actors or something. I started out with some simple observations which lead to simple theories, most of which I think are fairly uncontroversial. The weirdo stuff just grew as I tried to write programs that covered those basics.
First of all, the average conversation between a bot and a human is an incoherent mess. The proof of this assertion is the conversation logs: such a log might have its coherent passages, but on the whole, it reads like a random bunch of text lines.
Client: That is an evasive answer.
Bot: Heraclitus said: "You don't step into the same river twice."
Client: Why?
Bot: Why not?
Client: Who is Heraclitus?
Bot: I don't know who that is. What is your eye color?
Crap dialog. But fairly typical. So the mission for me became: Figure out how to write bots that say things which result in better logs.
What do you find when you read the typical log? Here is a list of the Top 20 "atomic" (meaning: fully recognized) client inputs, as recieved by the well-known ALICE bot, and published by its author, Richard Wallace, in an essay about Zipf's Law:
8024 YES
5184 NO
2268 OK
2006 WHY
1145 BYE
1101 HOW OLD ARE YOU
946 HI
934 HOW ARE YOU
846 WHAT
840 HELLO
663 GOOD
645 WHY NOT
584 OH
553 REALLY
544 YOU
531 WHAT IS YOUR NAME
525 COOL
516 I DO NOT KNOW
488 FUCK YOU
486 THANK YOU
The numbers represent the input frequency, indicating, for example, that input #1, YES, is about 16 times more likely to occur than input #20, THANK YOU. It's obvious that, to maintain anything resembling an "intelligent" conversation, a bot would have to respond plausibly at least to the most frequent inputs. It's also obvious that, to do that, it would have to be able to figure out what YES, WHY, WHAT mean in each case, with reference to (as a minimum) its own last output.
For AIML users, there are several ways to achieve this: either by simply using the
But: for an AIML set that includes, say, 40,000 categories - that's about the size of the very popular AAA set -, is there anything that might allow me to assume that 2006 WHY-inputs correspond to significantly less than 2006 different intended meanings of WHY? No, there isn't. It is plausible for the client to ask WHY as a response to many more than 2006 of the outputs that this set returns. So whichever technique you use: refering to the conversation state in a systematic way, even with regards to just one input, will almost inevitably lead to the problem of state space explosion. Unless...
Unless you use self-reference, building up your content in a way examplified by this little puzzle. Doing so might put you in a position where you say things that some other people think of as complete craziness, but on the other hand, it also has its advantages. More of which later...
First of all, the average conversation between a bot and a human is an incoherent mess. The proof of this assertion is the conversation logs: such a log might have its coherent passages, but on the whole, it reads like a random bunch of text lines.
Client: That is an evasive answer.
Bot: Heraclitus said: "You don't step into the same river twice."
Client: Why?
Bot: Why not?
Client: Who is Heraclitus?
Bot: I don't know who that is. What is your eye color?
Crap dialog. But fairly typical. So the mission for me became: Figure out how to write bots that say things which result in better logs.
What do you find when you read the typical log? Here is a list of the Top 20 "atomic" (meaning: fully recognized) client inputs, as recieved by the well-known ALICE bot, and published by its author, Richard Wallace, in an essay about Zipf's Law:
8024 YES
5184 NO
2268 OK
2006 WHY
1145 BYE
1101 HOW OLD ARE YOU
946 HI
934 HOW ARE YOU
846 WHAT
840 HELLO
663 GOOD
645 WHY NOT
584 OH
553 REALLY
544 YOU
531 WHAT IS YOUR NAME
525 COOL
516 I DO NOT KNOW
488 FUCK YOU
486 THANK YOU
The numbers represent the input frequency, indicating, for example, that input #1, YES, is about 16 times more likely to occur than input #20, THANK YOU. It's obvious that, to maintain anything resembling an "intelligent" conversation, a bot would have to respond plausibly at least to the most frequent inputs. It's also obvious that, to do that, it would have to be able to figure out what YES, WHY, WHAT mean in each case, with reference to (as a minimum) its own last output.
For AIML users, there are several ways to achieve this: either by simply using the
<that/>
and/or <topic/>
tags provided by the language for this purpose, or by developing more general functions that use recursion to increase process intensity, thereby saving authoring time while boosting control. But: for an AIML set that includes, say, 40,000 categories - that's about the size of the very popular AAA set -, is there anything that might allow me to assume that 2006 WHY-inputs correspond to significantly less than 2006 different intended meanings of WHY? No, there isn't. It is plausible for the client to ask WHY as a response to many more than 2006 of the outputs that this set returns. So whichever technique you use: refering to the conversation state in a systematic way, even with regards to just one input, will almost inevitably lead to the problem of state space explosion. Unless...
Unless you use self-reference, building up your content in a way examplified by this little puzzle. Doing so might put you in a position where you say things that some other people think of as complete craziness, but on the other hand, it also has its advantages. More of which later...
scheuring - 16. May, 18:23