If there's a problem with the Zipf curve, it's that the frequency differences between the inputs become very small very fast, and thus, more and more useless for hanging any programmatic structure to. It doesn't tell me much if I happen to know that, in 1.000 conversations, the pattern "I LOVE YOU" was matched 20 times while "THAT IS THE SHIT" got 21 matches. The Ranked patterns look like a random list.
Some AI developers therefore take an approach that looks for higher-level similarities between client behaviors and carves out larger "chunks" that can be addressed programmatically. Juergen Pirner, for instance, conceptualizes groups of client inputs as "tasks", and maintains a
task list. Since I work with a functional programming paradigm, what's a "task" for him is a "function call" for me, but we're handling the same phenomena.
Let me step back a little: traditionally, Information Theory assumes that the low frequency signals are associated with high information ratios, while high-frequency signals are associated with high noise ratios (redundancy, &c.). Though not many people seem to be saying much about it at this point, natural languages work somewhat differently. For example, the pattern "YES" holds Rank 2 on my list, matching about 1.4 % of the inputs. That's 2/5th of the percentage matched by Rank 1,the pattern which represents "[not recognized]" (i.e. "noise"), and it seems to be the same for masters of English/American-speaking bots everywhere.
But "yes" is nothing like "noise"; rather, it seems to be a kind of textual "meaning compressor". Depending on what was said before - the context -, the decompressed text can be infinitely varied:
"Yes." -> "I agree with you." <- "Do you agree with me?"
"Yes." -> "I do not agree with you." <- "You mean you don't agree with me?"
"Yes." -> "I want to get married." <- "Will you marry me?"
"Yes." -> "I want a divorce." <- "Will you divorce me?"
.
.
.
To me, this implies that I should treat the pattern "YES" as a high frequency signal which supplies a high information content, where at each signal instance, that content depends on the conversational context. My reaction to this is to declare "YES" to be a "function call", and require the function it calls to be total: by my theory,
every AI output can provide the context for a "YES" input, so the system must be able to infer a meaning of "YES" as a reply to
every line it can output. The fact is that typing/pasting "yes" into the input field regardless of the machine's output is one common way in which clients test the "awareness" of conversational interfaces. This suggestst to me that I need a total function here, which can assign a meaning to a "yes" input refering to every possible output the machine can generate, and return a string that reacts to that meaning as the next output.
Such a funtion might be hard to construct, but if I had one, it would cover 1.4 % of my inputs with context-relevant outputs, all in one fell swoop. Now I can even take a wider angle and say that my function should be able to take symbols as input which I judge to be eqivalent to "YES", like "THAT IS RIGHT", "FOR SURE", "CERTAINLY", &c. Those are not as frequent as yes, but they're all in the Top 1000, pushing the coverage towards, say, 1.7 %.
Let's call this group of patterns "agreement valuators" - which other "valuators" could I have? Why, "disagreement valuators", of course! If my function could also process the disagreement valuator "NO", that would add another 0.7 percent to its input coverage, resulting in 2.4 %. Adding "NOT", "WRONG", "FALSE", I'm approaching 2.8 %. "Consent/denial valuators" like "GOOD", "BAD", "COOL", and "UNCOOL" would push me well over 3 % .
Therefore, it's desirable to write program text that provides such a total function: a function which processes all those "valuators" and maps them on a "reasonable" output. But how can I make sure that the output can actually be
called "reasonable" by any measure? To test this, I can make use of another obvious high frequency/high information "meaning compressor" - the pattern "WHY".
What needs to happen here is basically the reverse of what needs to happen in the "valuator" function: let's say that the client got a valuator as
output from the machine. If this valuator represents actual information (i.e. in case of non-redundancy), humans almost reflexively ask for a reason behind this valuator - "Why?" (example expansion: "What was the reason for you to think that I would agree with you?"). This is why the pattern "WHY" commands Rank 4 in my list (0.35 %), after "[not recoginzed]", "YES", and "NO". Since inputting serial "why"s is another popular way to test a bot, I want the "reason" function to be total, too: for each of its outputs, the system must be able to give a reason, which has a reason . . . &c.
Though this is somewhat difficult to implement in any existing programming language, the payoff I expect is definitely an incentive for me to work very hard at it. Because a totally defined "reason" function would not only service the "why" input, but also many other inputs that I interpret as "being equivalent": "For what reason?", "How come?", "I don't think so", "I think you're wrong", &c. - I see them all as "calling the reason function". So I integrate them as recognized function call, and instead of having to muse about what I do with a certain pattern that has a 0.055 % matching probability, I just add it to the set of patterns that call "reason", the overall effect being that I push up the coverage of this function to 2 %.
All this means that, by integrating patterns which are distributed along the Zipf curve, I have a way of compressing it: in combination, the two functions I described cover about 5 % of my input space already. This is encouraging, so I'll extend it: even though I'm not likely to get the compression ratios of the top two functions when I go further down the curve, if I could find me a dozen that can integrate, say, the most frequent 5.000 patterns, that would give me like, 50 % of the coverage of the 2.000.000-million-pattern Parsimony system. So let's see: pattern "WHAT" (Rank 11) suggests a "purpose" function; pattern "WHAT IS *" (Rank 21) suggests a "definition" function . . .
Next stop:
closed-world negation and partial functions.