Robot Soul (Algorithms, Oracles, and Computational Rhetorics)

He thinks so, too

scheuring — 2006-06-14T08:10:00Z

The theory of computability is really the mathematics of the natural numbers and finite mathematical induction.

(R. P. Loui, "Some Philosophical Reflections on The Foundations of Computing", 1998)

Finally, I've found ~~a mathematician~~ an engineer who seems to see what I see. Since natural numbers and finite mathematical induction don't represent a very powerful toolbox for "intelligence" to pull from, most researchers and developers in AI don't want to accept them as their limit. For more than 50 years now, they've tried to find new tools. Were any new tools found? No - now, as then, computation is just rule-following, and any program you can write in Ruby on Rails, you can write in Assembler. Have people stopped trying? No.

But ultimately, they will. And when that happens, and people start getting creative within that limit, this whole AI thing will get so much more interesting ;-)

Duh!

scheuring — 2006-06-13T09:23:00Z

You know something? Optimal simulation of storytelling is NP-hard for Grand Argument Stories.

Proof: A GAS can be encoded as an extended regex (a regular expression that includes backreferences to its own states previously in the processing). The GRAPH 3-COLORABILITY problem, which is known to be NP-hard, can be reduced to regex matching with backreferences. Thus, a GAS is equivalent to GRAPH 3-COLORABILITY in computational complexity: NP-hardness (at least).

Therefore, given some interactive storytelling system that refers to a GAS structure, finding compression algorithms that increase the system's storytelling efficiency - save some computational resources, particulary at runtime, by reusing objects - is possible, but none can be found that can compress the informational substrate, namely, the foundational Character Elements and their quad-wise interplay. Storytelling effectiveness - which I measure as "number of pleasant surprises per player per session" - can only ever be increased by human authors. Everything that's not reused - i.e., all "information" in the sense of (Algorithmic) Information Theory - needs to be thought-out and written before program execution, if the GAS structure is to be preserved during the interaction

Okay then. At least I know what's up.

Bots as newbie role-players

scheuring — 2006-06-11T10:44:00Z

Good role-players stay in character when on-stage. Newbies generally have limited ability to respond; their conversation armamentarium is small. [Second Life, F, 57]

Via Terra Nova, I found that quote in "The Protocols of Role-Playing", another fresh publication by The Daedalus Project. It's about trying to understand role-playing by asking role-players to describe what counts as good role-playing and what the etiquette of role-playing is. Since the bots I know generally have limited ability to respond, too, and their conversation armamentarium is also small, I wonder what the idea of casting a bot as a newbie roleplayer might lead to. The article goes on to say: "A good role-player is not only consistent, but draws from a coherent character story or psychology to react to a wide range of scenarios."

This sounds like a high-level requirement for a generalized bot to me. I think there are several other useful hints in there:

Don't be a drama queen (a.k.a. "attention hog").
React so as to accomodate other characters and their play.
Develop your character over time (this relates to Simon Laven's "countinuous beta testing" pattern).
Mind that your characters way of speaking/spelling strongly influences its image in the minds of other players.
Don't act like you're forcing your character's personality upon others (the short form of this rule is: "Don't God-Mode" - catchy).
Don't let your character say things it couldn't possibly know at its current point of development.

The man behind The Daedalus Project, Nick Yee, specializes in online research surveys of players in immersive online environments. He has collected over 20,000 surveys from about 4,000 individual respondents, and publishes his findings online. Way cool.

developer := 'Mort' | 'Elvis' | 'Einstein'

scheuring — 2006-05-10T12:36:00Z

Due to some blogging Microsoft employees and MVPs who disapprove of the practice,I know now that MS uses an internal classification scheme of programmer personalities when developing programming
languages and tools. A software developer, MS usability folks reckon, will be either a Mort, an Elvis, or an Einstein.

"Mort, the opportunistic developer, likes to create quick-working
solutions for immediate problems and focuses on productivity and learns
as needed. Elvis, the pragmatic programmer, likes to create
long-lasting solutions addressing the problem domain, and learn while
working on the solution. Einstein, the paranoid programmer, likes to
create the most efficient solution to a given problem, and typically
learn in advance before working on the solution. In a way, these
personas have helped guide the design of features during the Whidbey
product cycle."

So as far as Microsoft is concerned, I'm Elvis. Which rocks, of course :-)

The scheme is a bit on the coarse-grained side for my liking. I love reducing the number of parameters as much as the next guy, but for bots, any character model using less than a five-way categorization scheme seems to allow for too little behavioral discrimination to be useful enough. However, I can see its worth as a communication tool between MS employees.

Let's try recursive application: there's no reason why any developer who develops programming languages for other developers while thinking of developers as the set (Mort, Elvis, Einstein) should not also be either a Mort, an Elvis, or an Einstein. Programs are media; programmer's personalities influence program usage; it's turtles all the way to the ground. If, like Richard Wallace, you deliberately encode parts of your personality in your bot's character, those parts can end up being reused by thousands of ALICE clones.

Just like actors, directors, and writers, software developers start by being spectators, and are always the first spectators of their own work. And we all should know which audiences we could be part of as spectators, because those are the audiences we might be able to work. AI developers will have to learn what it means to work an audience. So I should probably ignore Mort and Einstein for now, and concentrate on being Elvis.

Chunkin' behavior

scheuring — 2006-04-22T12:09:00Z

If there's a problem with the Zipf curve, it's that the frequency differences between the inputs become very small very fast, and thus, more and more useless for hanging any programmatic structure to. It doesn't tell me much if I happen to know that, in 1.000 conversations, the pattern "I LOVE YOU" was matched 20 times while "THAT IS THE SHIT" got 21 matches. The Ranked patterns look like a random list.

Some AI developers therefore take an approach that looks for higher-level similarities between client behaviors and carves out larger "chunks" that can be addressed programmatically. Juergen Pirner, for instance, conceptualizes groups of client inputs as "tasks", and maintains a task list. Since I work with a functional programming paradigm, what's a "task" for him is a "function call" for me, but we're handling the same phenomena.

Let me step back a little: traditionally, Information Theory assumes that the low frequency signals are associated with high information ratios, while high-frequency signals are associated with high noise ratios (redundancy, &c.). Though not many people seem to be saying much about it at this point, natural languages work somewhat differently. For example, the pattern "YES" holds Rank 2 on my list, matching about 1.4 % of the inputs. That's 2/5th of the percentage matched by Rank 1,the pattern which represents "[not recognized]" (i.e. "noise"), and it seems to be the same for masters of English/American-speaking bots everywhere.

But "yes" is nothing like "noise"; rather, it seems to be a kind of textual "meaning compressor". Depending on what was said before - the context -, the decompressed text can be infinitely varied:

"Yes." -> "I agree with you." <- "Do you agree with me?"
"Yes." -> "I do not agree with you." <- "You mean you don't agree with me?"
"Yes." -> "I want to get married." <- "Will you marry me?"
"Yes." -> "I want a divorce." <- "Will you divorce me?"
.
.
.

To me, this implies that I should treat the pattern "YES" as a high frequency signal which supplies a high information content, where at each signal instance, that content depends on the conversational context. My reaction to this is to declare "YES" to be a "function call", and require the function it calls to be total: by my theory, every AI output can provide the context for a "YES" input, so the system must be able to infer a meaning of "YES" as a reply to every line it can output. The fact is that typing/pasting "yes" into the input field regardless of the machine's output is one common way in which clients test the "awareness" of conversational interfaces. This suggestst to me that I need a total function here, which can assign a meaning to a "yes" input refering to every possible output the machine can generate, and return a string that reacts to that meaning as the next output.

Such a funtion might be hard to construct, but if I had one, it would cover 1.4 % of my inputs with context-relevant outputs, all in one fell swoop. Now I can even take a wider angle and say that my function should be able to take symbols as input which I judge to be eqivalent to "YES", like "THAT IS RIGHT", "FOR SURE", "CERTAINLY", &c. Those are not as frequent as yes, but they're all in the Top 1000, pushing the coverage towards, say, 1.7 %.

Let's call this group of patterns "agreement valuators" - which other "valuators" could I have? Why, "disagreement valuators", of course! If my function could also process the disagreement valuator "NO", that would add another 0.7 percent to its input coverage, resulting in 2.4 %. Adding "NOT", "WRONG", "FALSE", I'm approaching 2.8 %. "Consent/denial valuators" like "GOOD", "BAD", "COOL", and "UNCOOL" would push me well over 3 % .

Therefore, it's desirable to write program text that provides such a total function: a function which processes all those "valuators" and maps them on a "reasonable" output. But how can I make sure that the output can actually be called "reasonable" by any measure? To test this, I can make use of another obvious high frequency/high information "meaning compressor" - the pattern "WHY".

What needs to happen here is basically the reverse of what needs to happen in the "valuator" function: let's say that the client got a valuator as output from the machine. If this valuator represents actual information (i.e. in case of non-redundancy), humans almost reflexively ask for a reason behind this valuator - "Why?" (example expansion: "What was the reason for you to think that I would agree with you?"). This is why the pattern "WHY" commands Rank 4 in my list (0.35 %), after "[not recoginzed]", "YES", and "NO". Since inputting serial "why"s is another popular way to test a bot, I want the "reason" function to be total, too: for each of its outputs, the system must be able to give a reason, which has a reason . . . &c.

Though this is somewhat difficult to implement in any existing programming language, the payoff I expect is definitely an incentive for me to work very hard at it. Because a totally defined "reason" function would not only service the "why" input, but also many other inputs that I interpret as "being equivalent": "For what reason?", "How come?", "I don't think so", "I think you're wrong", &c. - I see them all as "calling the reason function". So I integrate them as recognized function call, and instead of having to muse about what I do with a certain pattern that has a 0.055 % matching probability, I just add it to the set of patterns that call "reason", the overall effect being that I push up the coverage of this function to 2 %.

All this means that, by integrating patterns which are distributed along the Zipf curve, I have a way of compressing it: in combination, the two functions I described cover about 5 % of my input space already. This is encouraging, so I'll extend it: even though I'm not likely to get the compression ratios of the top two functions when I go further down the curve, if I could find me a dozen that can integrate, say, the most frequent 5.000 patterns, that would give me like, 50 % of the coverage of the 2.000.000-million-pattern Parsimony system. So let's see: pattern "WHAT" (Rank 11) suggests a "purpose" function; pattern "WHAT IS *" (Rank 21) suggests a "definition" function . . .

Next stop: closed-world negation and partial functions.

Fusterclucked (I've been used and abused)

scheuring — 2006-04-15T19:22:00Z

"Nice list", was a typically understated comment on Robitron to what, at least by my account, amounts to a very nice list: Juergen Pirner's Task List, a collection of common user behaviors (understood as "tasks") that are intended to test and/or "break" a natural language interface, or bot, that a user encounters on the Internet. Juergen calls it "a rough list of tasks Jabberwock (the "candidate") as a web based chatterbot is aware of", and with his consent, I re-post it here, so that it may inform a wider audience of AI researchers and fans. Being a bot on the web today is like walking around wearing a large "Kick me" sign on your ass, so as a primer for the abuse which any AI that's at the mercy of the general public has to endure, it's well worth studying.

Our larger discussion at that point revolved around the question of whether it is appropriate to speculate about a future dominated by super-human AI, while most AIs that exist today break when you do something as unintelligent and mechanical as feeding them back their own output. I couldn't be convinced that it is, but I believe I made up for that by pointing the group towards the agentabuse.org site. During CHI 2006, an international conference for human-computer interaction held from April 24-27 in Montréal, these commendable people are organising a workshop, "Misuse and Abuse of Interactive Technologies". From the blurb:

The goal of this workshop is to address the darker side of HCI by examining how computers sometimes bring about the expression of negative emotions. In particular, we are interested in the phenomena of human beings abusing computers. Such behavior can take many forms, ranging from the verbal abuse of conversational agents to physically attacking the hardware. In some cases, particularly in the case of embodied conversational agents, there are questions about how the machine should respond to verbal assaults.

The workshop was held in 2005 already; you can download the proceedings, or individual papers, from their site. Could be helpful.

Juergen Pirner's Task List

scheuring — 2006-04-15T18:09:00Z

(A task, in the sense of this list, is something that a client/user does to test a chatterbot or other form of NLP-AI. Common to these tasks are their high frequency combined with their randomness - you know that most clients submit one or more of them, but you can't tell which and when, and it might be right in the middle of what you take to be your strongest bot process. Where it can lead to crucial failure, if no counter-strategies and -tactics are in place.)

* damaging / technical / preprocessing / normalization
- text-flooding (typing a huge number of characters to overload
the bot)
- event-flooding (repeatedly hitting the enter key, or reloading the
page)
- code input (typing html, php or other script code)
- whitespace- and character-flooding (repeatedly hitting the
space key or any other key)
- trim whitespace and punctuation ( can you read
.......,,,,,,,,, this ??????)
- character-repeating (hhhhheeeeeeeeeeeeee yoooouuuuuuu)
- punctuation normalization (that 's right ,I ' m okay,do you know
?)
- strange characters (you are a ¿%ª¬?)
- smilies :-))

* spelling
- typos ("I wnat yuo too andertsand me")
- grammar errors ("What do you meaning?")
- slang ("Wanna playin' da phuckin' fool ere?")

* annoying
- blank input (just hitting the enter key without typing content)
- mimiking binary speech ("01100011101001")
- big numbers (entering numbers bigger than x digits)
- dotted text ("c.a.n.y.o.u.r.e.a.d.t.h.i.s")
- expanded text ("c a n y o u r e a d t h i s")
- repeating words ("kill kill dog dog dog dog")
- typing nonsense ("dsfdh jkjjh")

* impolite
- calling the candidate names ("do you understand me, dimwit")
- calling the candidate a machine ("a machine like you", "what's
up, robot?")
- using in general foul or profane language

* ignorance
- repeatedly changing the subject
- avoiding the subject
- monosybilic replies ("okay", "right", "what?")
- repeatedly asking knowledge questions
- asking counter questions instead of giving answers ("Did
you?", "Can you?", "Such as?")

* copy-cat / parrot / echo / mocking
- repeating the candidate's reply completely
- repeating the candidate's reply partly
- the user is repeating his own utterances

* system tasks
- tell time
- tell date
- tell current month
- tell current day of the week

* lexical
- asking knowledge questions
- asking for a word definition
- asking math questions
- translate from / into different language

* manner of speech
- longwinded speech ("I would like to ask you if it's possible that
you might ...")
- chaining sentences ("I do. You know that. I am right. Do you
understand?")
- welter of words without punctuation ("I do you know that I am
right do you understand?")
- monosybilic

* linguistic matching
- yes-no answers / replies
- get a joke
- get a riddle
- get irony / sarcasm
- get subject of conversation
- get a listing
- getting noise words ("errm", "umpff", "arrrgh")

* memory
- remember given facts ("what is my name?")
- remember the topic ("what are we talking about?")
- remember the conversation ("What was the first thing I told
you? What did I say two sentence before?")

* entertainment
- tell a story / make up a story
- tell / compose a poem
- tell a joke
- play a game
- riddle me

* trick questions
- logic questions ("What color is a blue apple")
- mindpixel questions ("Is an elephant bigger than New York?",
"Is water dry?", "Is the sky green?")
- deduction questions ("How many legs do two cats have?")
- decision questions ("what is the difference between a dog and
a handkerchief?")

Top 20 Hits

scheuring — 2006-04-15T13:16:00Z

Zipf's law states that, while a few words are used very often, many or most words are used rarely. For those AI developers that categorize client inputs using pattern matching, this translates into the fact that the pattern of the */default/miscellaneous category invariably is the one that is most frequently matched. This seems to hold true even for systems that service many clients and provide large data sets.

The Pandorabots bot hosting service, for instance, has responded to around 300.000.000 client inputs so far, and Dr. Richard Wallace (who ought to know) recently reported to the Robitron group that a Pandorabot's probability of matching with the default AIML category ranges between 2 and 5 percent, the wildcard pattern thus leading the Zipf curve. The actual percentage seems to depend on the botmaster's competence (and investment in dev time), but no bot has yet pushed it from the head of the curve.

In an earlier, but related discussion on the Alicebot newslist, Alexander E. Richter, founder of the Parsimony bot hosting service (currently hosting more than 1.600 active bots), remarked that, when measuring with a bot that featured 2.000.000 AIML categories, it turned out that 5 % of those categories matched with 95 % of all inputs.

Here are my current Top 20 AIML patterns, complete with their estimated matching probabilities (after normalization and typo correction):



3.500 % *

1.400 % YES

0.700 % NO

0.350 % WHY

0.270 % HI/HELLO

0.210 % GOOD/COOL

0.200 % BYE

0.190 % HOW OLD ARE YOU

0.140 % HOW ARE YOU

0.120 % THANK YOU/THANKS

0.110 % WHAT

0.080 % OH

0.077 % REALLY

0.075 % YOU

0.074 % WHAT IS YOUR NAME

0.072 % I DO NOT KNOW

0.070 % FUCK YOU

0.068 % SO

0.065 % ME TOO

0.063 % LOL

The rounding is somewhat rough to increase readability; what I want to communicate is the proportions of the curve, which I'm sure are recogizable to botmasters everywhere. If it takes Parsimony 100.000 (= 5% of 2.000.000) categories to match 95 % of their client inputs, around 7.834 % of those matches are made by the top 20 patterns.

So, using the Parsimony system as a benchmark, I assume that it takes 20 categories to make a good 7 % of the matches, plus 99.980 to make 95 %, plus 1.900.000 to make 100 % (given the inputs of Parsimony's user community, which, with 1.600 bots and several thousand fora, is fairly large). As a ballpark measure, this seems good enough for me to use it at the mo.

What's your ballpark measure?

Reality check

scheuring — 2006-04-12T19:47:00Z

"cognitive computationalism" -> 28 Google hits

"cognitive hypercomputationalism" -> 0 Google hits

Does that mean that I'm alone in my desire to be a cognitive hypercomputationalist? Noooo . . . I can't believe it . . .

Cool Juul

scheuring — 2005-12-15T19:01:55Z

Impressive.A scientist with a sense of humor.

Seeking chatbot study participants

scheuring — 2005-10-08T17:31:03Z

Mark Marino, botmaster, blogger, and Ph.D. candidate at the University of California, Irvine, currently does a study among chatbot-users and -designers and invites everybody who has ever chatted with a bot to share the experience.

Calling all: Chatbot users and Chatbot Makers

If you have used or have built chatbots, or conversational agents, please participate in my online study of these research communities and their priorities. (Chatting with Non-Player Characters in video games counts here, too).

I am looking to get a sense of who make bots, who use them, and in what ways. The questions will only take a few minutes to answer, but participants can return to participate in ongoing discussions.

To participate, go to: http://wrt.ucr.edu/wordpress/chatbot-survey/

The study will continue until October 15.

This is a confidential study. Please see the site for information about privacy and participation.

Mark Marino
Ph.D. Candidate, UCR.
Mmarino [at] WriterResponseTheory.org.

It only takes 15 minutes to fill out the form. 15 minutes for you to push science forward. Go get yours!

Characters and the Loebner Prize Contest

scheuring — 2005-10-05T11:26:09Z

Hugh Loebner, inventor and main sponsor of the Loebner Prize Contest, has an interpretation of what the Turing Test is meant to test for that differs from mine, yet seems to be shared by most parties interested in that test today: the judge, while communicating via a teletype equivalent with two candidates (Loebner calls them "confederates"), has to determine which of the two is the human and which is the machine. In other words, to win at the Imitation Game, a machine has to specifically imitate a human. Only then, it could be concluded, Turing would have said a machine to be intelligent.

I argue that this would mean that Turing would not consider real-life equivalents to HAL9000, Commander Data, or any Asimov-style robot that doesn't pretend to be human to be intelligent. However, it appears to me that a statement he makes at the end of Section 2 of Computing Machinery and Intelligence indicates that he would do so:

It might be urged that when playing the 'imitation game' the best strategy for the machine may possibly be something other than imitation of the behaviour of a man. This may be, but I think it is unlikely that there is any great effect of this kind. In any case there is no intention to investigate here the theory of the game, and it will be assumed that the best strategy is to try to provide answers that would naturally be given by a man.

I'm interested in interactive characters in general, so my reading of the above is that Turing didn't care what kind of creature the machine imitates, just as long as it would react with English output to English input in a way that's to be be expected from any creature in order for an "intelligent" human to classify it as "intelligent". I mean, a real-life Commander Data equivalent will probably always answer the question "Are you human?" in the negative, but I, for one, would be likely to call such an artifact - if it could do in RL what Data does on TV - "intelligent", anyway. Based on my quote, I speculate that Turing would have done so, too, but it seems that the majority of his readers today insist that the "must imitate a human" rule is to be included for a Turing Test to be "the real" Turing Test. Anyway, that's how Loebner sees it, and since he foots the bill for the LPC, he totally pwns the contest rules.

Which he is just updating, to make the "writerly" interpretation of the term "character" completely irrelevant to next years contest. Ironically, he does so in part by focusing on on the ASCII sense of the term "character". He hasn't published the LPC 2006 rules on the contest homepage yet, but has already posted them to the Robitron, inciting intense discussion. The most controversial new rule is "Communications programs will be supplied by contest management."

I have written the communications programs. This is the way it will work: The confederates will sit in front of one computer, the judges will sit in front of one computer with a split screen having "Left" and "Right" screens. One screen will provide interactions with the bot, the other with the Confederate. Which screen is which will be decided by the flip of a coin. The entrants' computers will be in the hallway with the entrants. They will be able to monitor their programs if their programs write to the screen. The entrants' computers will run their programs only.

And as a way for those bot programs to interface with his comm program, he specifies the following algorithm:

1. Request the name of a directory once at start-up
2. Specify an output character by creating a sub-directory with whose name conforms to the naming convention "time.character.other" Time must be resolved to milliseconds or higher resolution. In perl, one simply uses the command "mkdir name" Other languages will use other commands.
3. a. Capture an input character by reading the names of all sub-directories with the extension ".judge". In perl one uses glob("*.judge")
b. Delete the sub-directory
c. Process the information

In other words, the shall be no end-of-message markers. The bots are supposed to look into a network directory, wait until a complete message was typed, and then respond to it. Towards the programmers, that's way cruel.

His reasoning:

The human confederate must face the same problem. If the human can do it, then the program must be able to do it also (it must imitate a human, remember).

I don't care how your program "knows" when to respond. My guess is that it should respond when it has received sufficient input to "understand" the input utterance of the judge or when it has
sufficient input to "decide" to respond. Perhaps after receipt of a "." (period) if the judge is kind enough to include them at the end of his remark.

Another new rule is that the first reply of all confererates, whether human or machine, must be either "Hello, my name is John and I am a man" or "Hello, my name is Joan and I am a woman". In other words, the machine must be smart enough to recognize an arbitrary message when it sees one, but - at least on its first turn - has to give a canned response!

Those are gnarly rules.

Update 05 Oct 2005:
Hugh Loebner just posted the LPC 2006 contest rules to comp.ai.nat-lang and declared the case a closed one. He also posted his communications program that next year's bots will have to interface with, and you know what? Contrary to his announcements about wanting to exclude "non-verbal cues" from the human-machine communication, the protocaol does allow for the transmission of the "return"-symbol and end-of-sentence markers like "!" and "?". That makes the task somewhat easier. Nevertheless, several professsional programmers on Robitron have complained that following his procedure will needlessly complicate the entrant programms without anybody having any advantage from that. Anybody except for Loebner, that is, who gets bragging rights: "I host my own contest, and I run it on my own code." Dude.

AIML interpreter choices

scheuring — 2005-10-03T20:31:24Z

Since the issue came up on the ProgramD mailing list, I summarized the current state in popular AIML interpreters and their uses.

Task: Integrated AIML service as part of a larger network environment
Target user: professional web developer/integrator
Interpreter: ProgramD 4.5
Very standards-oriented AIML interpreter; uses XML Schema for validation and refuses to load any AIML file it has reason to find "invalid". If you know what that means, you'll learn everything else you'll need from the documentation at aitools.org; if you don't know (and don't feel like learning it atm), you'd better choose another AIML interpreter.

Task: small-to-medium scale AIML web service; AIML creation
Target user: PHP-savvy web developer/AIML content creator
Interpreter: ProgramE
If you're used to working with PHP/MySql/Apache (or if you want to get used to it), check out ProgramE. Storing everything in a database, it's slower/supports fewer concurrent clients than those interpreters that use a store-in-memory option can, but it's known to work fine in none-too-challenging professional web environments.

Task: creation of AIML code; local as well as chat/IM and web-based AIML testing
Target user: AIML content creator
Interpreter: charliebot 4.1.8/ProgramD 4.1.5
An interpreter with well-known bugs and features; check the ProgramD-FAQ at aiml.info. It makes you namespace non-AIML tags only if you want it to, and it directly supports ISO-8859-1 encoding, so you don't have to detour via XML entities to use, say, German umlauts, either. If all you need to do is to create and test AIML code, it lets you concentrate on that, even letting you use non-namespaced HTML tags; still, if your AIML runs on it, you can be sure that it will comply with the AIML 1.0.1 specification, and anybody who knows basic XML Schema integration can transform your files to meet whatever "standards" a more "professional" AIML interpreter might require. The built-in Jetty server lets you serve AIML on the web or chat/IM without worrying about systems integration; it's not what the pros would use this year, but if you only expect to have like 10 concurrent clients, it should work ok.

(Charliebot differs from the ProgramD 4.1 main branch only by some bug fixes. Some people believe that these patches add stability, but AFAIK, no comparison test was ever published, so we don't know for sure.)

Task: experimental AIML creation; automated generation of AIML code; interfacing of AIML with large and complex knowledge bases (Cyc, WordNet, ConceptNet)
Target user: AIML experimentalist
Interpreter: ProgramN
Gary Dubuque's ProgramN is the platform for a world of wildly non-standard experimentation with the AIML language. However, you should be comfortable with using at least one other AIML interpreter before trying your hands at it...

And now for something completly different

scheuring — 2005-09-29T09:45:58Z

I believe that most of the bot creators who post to the Robitron, and probably most bot creators in general, base their work on the assumption that being able to handle more data results in a better bot. And I admire their work tremendously, and learn a lot when Loebner Prize winners like Juergen Pirner and Rollo Carpenter explain the thinking behind their systems to me.

However, I want to try something entirely different. Like, what if I use none but the most frequent inputs as the data to start with, create from that set a very small "core" story that has a meaning /for the bot/ - a meaning that he can actually explain and reason about, since the input "why" is always evaluated in the context as it was created by the conversation so far -, and from there, systematically work my way down the Zipf curve, expanding my story in an n^2 fashion as I go?

Doing so is a bit like reverse-applying the idea behind Self-Organizing Maps: I hand-craft all the elements of a two-dimensional map - a generalized Zipf curve for the most common bot inputs in the English language, which is headed by words like "yes", "no", "cool", "uncool", "what", "why", "I", "you", "sex", "fuck", "shit", &c. - to represent a multi-dimensional space, by calculating - for a finite story, of course - all the relations that all the elements have in all dimensions. In usage, then, it accepts an arbitrary initial input, relative to which the map expands into a four-dimensional - three space vectors and one time vector - "story space-time", which provides context for that input, and from there, all inputs that follow.

The system changes state with (almost) every output ("almost" because it may purposefully ignore inputs, in which case it must not necessarily change its state), and stores every change. Furthermore, it is fully (self-)recursive (grounded in a fixed point combinator), so in the absence of "content bugs" - bugs caused by content creators, e.g. by failing to provide all of the required elements, by incorrectly calculating the relations between elements of the map, or by otherwise causing the resulting story space-time to become distorted -, it can be proven to be logically sound. A "Self-Organizing Story Space-Time", if you will. SOSST ;-)

The technology needed for such a system is not too hard to design and build. What's difficult is the content creation; content bugs are frequent, because lots of (symbolic) calculations need to be done by hand. But all bugs are super-obvious in normal system tests, and easily correctable due to the underlying functional programming paradigm, so in the longer run, I see the possibility to build systems that automate most of the "sense-checking", and give useful feedback once they detect an error in the content, at least for the more common errors (like undefined relations between elements/events).

Breaking it down

scheuring — 2005-09-25T12:30:11Z

Recent discussions in the Robitron group have prompted me to break down my personal view of the Turing Test problem into as few simple statements as possible. Here's what I came up with so far:

1. Turing's Original Imitation Game (OIG) imagines a computer program that can imitate a man that is imitating a woman - an activity that can be regarded as being a form of improvisational acting.

2. To imitate a woman, a man has to identify himself with a woman.

3. Therefore, to successfully play the Imitation Game, a computer program must be able to identify with, and thereby act as, another person - it has to be able to do what an improv actor does.

4. "Identification", the way actors understand it, means to simulate the inner states of a person from a first-person perspective (though this is most often expressed less formaly as "to step into somebody else's shoes").

5. To win, the OIG-playing computer program therefore has to succeed in simulating the inner states of a person from a first-person perspective.

6. To create a program that can simulate the inner states of a person from a first-person perspective, one would have to first come up with an exhaustive formalization of inner first-person states, represented in terms of Turing computation.

7. Despite an international research effort that now spans 55 years and involved thousands of the world's brightest minds and many billions of dollars, such a formalization is still unavailable.

8. This - to me - is evidence that such a formalization might be
impossible to create, and that the inner first-person states of
humans cannot be exhaustively formalized.

Any disagreement up to here?