Robot Soul

"Nice list", was a typically understated comment on Robitron to what, at least by my account, amounts to a very nice list: Juergen Pirner's Task List, a collection of common user behaviors (understood as "tasks") that are intended to test and/or "break" a natural language interface, or bot, that a user encounters on the Internet. Juergen calls it "a rough list of tasks Jabberwock (the "candidate") as a web based chatterbot is aware of", and with his consent, I re-post it here, so that it may inform a wider audience of AI researchers and fans. Being a bot on the web today is like walking around wearing a large "Kick me" sign on your ass, so as a primer for the abuse which any AI that's at the mercy of the general public has to endure, it's well worth studying.

Our larger discussion at that point revolved around the question of whether it is appropriate to speculate about a future dominated by super-human AI, while most AIs that exist today break when you do something as unintelligent and mechanical as feeding them back their own output. I couldn't be convinced that it is, but I believe I made up for that by pointing the group towards the agentabuse.org site. During CHI 2006, an international conference for human-computer interaction held from April 24-27 in Montréal, these commendable people are organising a workshop, "Misuse and Abuse of Interactive Technologies". From the blurb:

The goal of this workshop is to address the darker side of HCI by examining how computers sometimes bring about the expression of negative emotions. In particular, we are interested in the phenomena of human beings abusing computers. Such behavior can take many forms, ranging from the verbal abuse of conversational agents to physically attacking the hardware. In some cases, particularly in the case of embodied conversational agents, there are questions about how the machine should respond to verbal assaults.

The workshop was held in 2005 already; you can download the proceedings, or individual papers, from their site. Could be helpful.

scheuring - 15. Apr, 21:22

0 comments - add comment

Top 20 Hits

Zipf's law states that, while a few words are used very often, many or most words are used rarely. For those AI developers that categorize client inputs using pattern matching, this translates into the fact that the pattern of the */default/miscellaneous category invariably is the one that is most frequently matched. This seems to hold true even for systems that service many clients and provide large data sets.

The Pandorabots bot hosting service, for instance, has responded to around 300.000.000 client inputs so far, and Dr. Richard Wallace (who ought to know) recently reported to the Robitron group that a Pandorabot's probability of matching with the default AIML category ranges between 2 and 5 percent, the wildcard pattern thus leading the Zipf curve. The actual percentage seems to depend on the botmaster's competence (and investment in dev time), but no bot has yet pushed it from the head of the curve.

In an earlier, but related discussion on the Alicebot newslist, Alexander E. Richter, founder of the Parsimony bot hosting service (currently hosting more than 1.600 active bots), remarked that, when measuring with a bot that featured 2.000.000 AIML categories, it turned out that 5 % of those categories matched with 95 % of all inputs.

Here are my current Top 20 AIML patterns, complete with their estimated matching probabilities (after normalization and typo correction):



3.500 % *

1.400 % YES

0.700 % NO

0.350 % WHY

0.270 % HI/HELLO

0.210 % GOOD/COOL

0.200 % BYE

0.190 % HOW OLD ARE YOU

0.140 % HOW ARE YOU

0.120 % THANK YOU/THANKS

0.110 % WHAT

0.080 % OH

0.077 % REALLY

0.075 % YOU

0.074 % WHAT IS YOUR NAME

0.072 % I DO NOT KNOW

0.070 % FUCK YOU

0.068 % SO

0.065 % ME TOO

0.063 % LOL

The rounding is somewhat rough to increase readability; what I want to communicate is the proportions of the curve, which I'm sure are recogizable to botmasters everywhere. If it takes Parsimony 100.000 (= 5% of 2.000.000) categories to match 95 % of their client inputs, around 7.834 % of those matches are made by the top 20 patterns.

So, using the Parsimony system as a benchmark, I assume that it takes 20 categories to make a good 7 % of the matches, plus 99.980 to make 95 %, plus 1.900.000 to make 100 % (given the inputs of Parsimony's user community, which, with 1.600 bots and several thousand fora, is fairly large). As a ballpark measure, this seems good enough for me to use it at the mo.

What's your ballpark measure?

scheuring - 15. Apr, 15:16

4 comments - add comment

Wednesday, 12. April 2006

Reality check

"cognitive computationalism" -> 28 Google hits

"cognitive hypercomputationalism" -> 0 Google hits

Does that mean that I'm alone in my desire to be a cognitive hypercomputationalist? Noooo . . . I can't believe it . . .

scheuring - 12. Apr, 21:47

0 comments - add comment

Thursday, 15. December 2005

Cool Juul

Impressive.A scientist with a sense of humor.

scheuring - 15. Dec, 20:01

0 comments - add comment

Saturday, 8. October 2005

Seeking chatbot study participants

Mark Marino, botmaster, blogger, and Ph.D. candidate at the University of California, Irvine, currently does a study among chatbot-users and -designers and invites everybody who has ever chatted with a bot to share the experience.

Calling all: Chatbot users and Chatbot Makers

If you have used or have built chatbots, or conversational agents, please participate in my online study of these research communities and their priorities. (Chatting with Non-Player Characters in video games counts here, too).

I am looking to get a sense of who make bots, who use them, and in what ways. The questions will only take a few minutes to answer, but participants can return to participate in ongoing discussions.

To participate, go to: http://wrt.ucr.edu/wordpress/chatbot-survey/

The study will continue until October 15.

This is a confidential study. Please see the site for information about privacy and participation.

Mark Marino
Ph.D. Candidate, UCR.
Mmarino [at] WriterResponseTheory.org.

It only takes 15 minutes to fill out the form. 15 minutes for you to push science forward. Go get yours!

scheuring - 8. Oct, 19:31

0 comments - add comment

Wednesday, 5. October 2005

Characters and the Loebner Prize Contest

Hugh Loebner, inventor and main sponsor of the Loebner Prize Contest, has an interpretation of what the Turing Test is meant to test for that differs from mine, yet seems to be shared by most parties interested in that test today: the judge, while communicating via a teletype equivalent with two candidates (Loebner calls them "confederates"), has to determine which of the two is the human and which is the machine. In other words, to win at the Imitation Game, a machine has to specifically imitate a human. Only then, it could be concluded, Turing would have said a machine to be intelligent.

I argue that this would mean that Turing would not consider real-life equivalents to HAL9000, Commander Data, or any Asimov-style robot that doesn't pretend to be human to be intelligent. However, it appears to me that a statement he makes at the end of Section 2 of Computing Machinery and Intelligence indicates that he would do so:

It might be urged that when playing the 'imitation game' the best strategy for the machine may possibly be something other than imitation of the behaviour of a man. This may be, but I think it is unlikely that there is any great effect of this kind. In any case there is no intention to investigate here the theory of the game, and it will be assumed that the best strategy is to try to provide answers that would naturally be given by a man.

I'm interested in interactive characters in general, so my reading of the above is that Turing didn't care what kind of creature the machine imitates, just as long as it would react with English output to English input in a way that's to be be expected from any creature in order for an "intelligent" human to classify it as "intelligent". I mean, a real-life Commander Data equivalent will probably always answer the question "Are you human?" in the negative, but I, for one, would be likely to call such an artifact - if it could do in RL what Data does on TV - "intelligent", anyway. Based on my quote, I speculate that Turing would have done so, too, but it seems that the majority of his readers today insist that the "must imitate a human" rule is to be included for a Turing Test to be "the real" Turing Test. Anyway, that's how Loebner sees it, and since he foots the bill for the LPC, he totally pwns the contest rules.

Which he is just updating, to make the "writerly" interpretation of the term "character" completely irrelevant to next years contest. Ironically, he does so in part by focusing on on the ASCII sense of the term "character". He hasn't published the LPC 2006 rules on the contest homepage yet, but has already posted them to the Robitron, inciting intense discussion. The most controversial new rule is "Communications programs will be supplied by contest management."

I have written the communications programs. This is the way it will work: The confederates will sit in front of one computer, the judges will sit in front of one computer with a split screen having "Left" and "Right" screens. One screen will provide interactions with the bot, the other with the Confederate. Which screen is which will be decided by the flip of a coin. The entrants' computers will be in the hallway with the entrants. They will be able to monitor their programs if their programs write to the screen. The entrants' computers will run their programs only.

And as a way for those bot programs to interface with his comm program, he specifies the following algorithm:

1. Request the name of a directory once at start-up
2. Specify an output character by creating a sub-directory with whose name conforms to the naming convention "time.character.other" Time must be resolved to milliseconds or higher resolution. In perl, one simply uses the command "mkdir name" Other languages will use other commands.
3. a. Capture an input character by reading the names of all sub-directories with the extension ".judge". In perl one uses glob("*.judge")
b. Delete the sub-directory
c. Process the information

In other words, the shall be no end-of-message markers. The bots are supposed to look into a network directory, wait until a complete message was typed, and then respond to it. Towards the programmers, that's way cruel.

His reasoning:

The human confederate must face the same problem. If the human can do it, then the program must be able to do it also (it must imitate a human, remember).

I don't care how your program "knows" when to respond. My guess is that it should respond when it has received sufficient input to "understand" the input utterance of the judge or when it has
sufficient input to "decide" to respond. Perhaps after receipt of a "." (period) if the judge is kind enough to include them at the end of his remark.

Another new rule is that the first reply of all confererates, whether human or machine, must be either "Hello, my name is John and I am a man" or "Hello, my name is Joan and I am a woman". In other words, the machine must be smart enough to recognize an arbitrary message when it sees one, but - at least on its first turn - has to give a canned response!

Those are gnarly rules.

Update 05 Oct 2005:
Hugh Loebner just posted the LPC 2006 contest rules to comp.ai.nat-lang and declared the case a closed one. He also posted his communications program that next year's bots will have to interface with, and you know what? Contrary to his announcements about wanting to exclude "non-verbal cues" from the human-machine communication, the protocaol does allow for the transmission of the "return"-symbol and end-of-sentence markers like "!" and "?". That makes the task somewhat easier. Nevertheless, several professsional programmers on Robitron have complained that following his procedure will needlessly complicate the entrant programms without anybody having any advantage from that. Anybody except for Loebner, that is, who gets bragging rights: "I host my own contest, and I run it on my own code." Dude.

scheuring - 5. Oct, 13:26

0 comments - add comment

older stories