Saturday, 15. April 2006

Fusterclucked (I've been used and abused)

"Nice list", was a typically understated comment on Robitron to what, at least by my account, amounts to a very nice list: Juergen Pirner's Task List, a collection of common user behaviors (understood as "tasks") that are intended to test and/or "break" a natural language interface, or bot, that a user encounters on the Internet. Juergen calls it "a rough list of tasks Jabberwock (the "candidate") as a web based chatterbot is aware of", and with his consent, I re-post it here, so that it may inform a wider audience of AI researchers and fans. Being a bot on the web today is like walking around wearing a large "Kick me" sign on your ass, so as a primer for the abuse which any AI that's at the mercy of the general public has to endure, it's well worth studying.

Our larger discussion at that point revolved around the question of whether it is appropriate to speculate about a future dominated by super-human AI, while most AIs that exist today break when you do something as unintelligent and mechanical as feeding them back their own output. I couldn't be convinced that it is, but I believe I made up for that by pointing the group towards the site. During CHI 2006, an international conference for human-computer interaction held from April 24-27 in Montréal, these commendable people are organising a workshop, "Misuse and Abuse of Interactive Technologies". From the blurb:
The goal of this workshop is to address the darker side of HCI by examining how computers sometimes bring about the expression of negative emotions. In particular, we are interested in the phenomena of human beings abusing computers. Such behavior can take many forms, ranging from the verbal abuse of conversational agents to physically attacking the hardware. In some cases, particularly in the case of embodied conversational agents, there are questions about how the machine should respond to verbal assaults.
The workshop was held in 2005 already; you can download the proceedings, or individual papers, from their site. Could be helpful.

Top 20 Hits

Zipf's law states that, while a few words are used very often, many or most words are used rarely. For those AI developers that categorize client inputs using pattern matching, this translates into the fact that the pattern of the */default/miscellaneous category invariably is the one that is most frequently matched. This seems to hold true even for systems that service many clients and provide large data sets.

The Pandorabots bot hosting service, for instance, has responded to around 300.000.000 client inputs so far, and Dr. Richard Wallace (who ought to know) recently reported to the Robitron group that a Pandorabot's probability of matching with the default AIML category ranges between 2 and 5 percent, the wildcard pattern thus leading the Zipf curve. The actual percentage seems to depend on the botmaster's competence (and investment in dev time), but no bot has yet pushed it from the head of the curve.

In an earlier, but related discussion on the Alicebot newslist, Alexander E. Richter, founder of the Parsimony bot hosting service (currently hosting more than 1.600 active bots), remarked that, when measuring with a bot that featured 2.000.000 AIML categories, it turned out that 5 % of those categories matched with 95 % of all inputs.

Here are my current Top 20 AIML patterns, complete with their estimated matching probabilities (after normalization and typo correction):

3.500 % *
1.400 % YES
0.700 % NO
0.350 % WHY
0.270 % HI/HELLO
0.210 % GOOD/COOL
0.200 % BYE
0.140 % HOW ARE YOU
0.110 % WHAT
0.080 % OH
0.077 % REALLY
0.075 % YOU
0.072 % I DO NOT KNOW
0.070 % FUCK YOU
0.068 % SO
0.065 % ME TOO
0.063 % LOL

The rounding is somewhat rough to increase readability; what I want to communicate is the proportions of the curve, which I'm sure are recogizable to botmasters everywhere. If it takes Parsimony 100.000 (= 5% of 2.000.000) categories to match 95 % of their client inputs, around 7.834 % of those matches are made by the top 20 patterns.

So, using the Parsimony system as a benchmark, I assume that it takes 20 categories to make a good 7 % of the matches, plus 99.980 to make 95 %, plus 1.900.000 to make 100 % (given the inputs of Parsimony's user community, which, with 1.600 bots and several thousand fora, is fairly large). As a ballpark measure, this seems good enough for me to use it at the mo.

What's your ballpark measure?

Recent Comments

I feel fine.
I know someone will comment on it soon :-) Theatre...
scheuring - 14. Jun, 10:24
How do you feel when...
How do you feel when you receive no comments? How can...
Magical - 14. Jun, 09:19
Thanks, Brian,
for this interesting invitation. Since, by your own...
scheuring - 15. May, 10:33
AI-Foundation Panel
Dirk, I like the thinking. Because of that expertise,...
Brian Hoecht - 13. May, 22:05
you're welcome.
scheuring - 29. Apr, 16:29
thanks scheuring!
Cool, that seems to cover most of the basics. Definitely...
drgold - 28. Apr, 05:41
Top 400
About five years ago (pre-ProgramD), the "standard"...
scheuring - 22. Apr, 14:55


vi knallgrau GmbH

powered by Antville powered by Helma

Creative Commons License

xml version of this page AGB

Subscribe Weblog