This problem has an analytic solution that is much better than the naive one in practice, though still nowhere near as good as the Weasel applet (again, in practice).
A few notes for readers whose high school math may be rusty:
Let us begin by stating a little more formally the problem that the Weasel applet is designed to solve. Suppose that we have an unknown target string T comprised of any k symbols drawn from the alphabet A, where k and A are known. Our goal is to guess T, in pursuit of which we are also given a score function, S, that takes as input a guess at T and reports the number of mismatches between the guess and T. (Hence S returns an integer from 0 to k; when it returns 0, we have guessed T.)
So, for concreteness, if T happens to be "abc", S("abx") would yield 1, since one character of our guess differed from the actual target string T.
It may seem at first that S can't help us constrain our guess at T, that we simply have to try all strings of length k drawn from the alphabet A until we happen to get the right one -- a potential total of |A|^k guesses. But this is not our only option.
Suppose that we supply S with a guess comprised of k repetitions of a single alphabet symbol L. S then returns a score equal to k - x, where x is the number of times L occurs in T. For instance, suppose again that T is "abc" and suppose further that our alphabet A is the set of lower-case letters in the Latin (English) alphabet. Then:
S("aaa") reports 2 (= k - 1), so there is 1 "a" in T.
S("bbb") reports 2 (= k - 1), so there is 1 "b" in T.
S("ccc") reports 2 (= k - 1), so there is 1 "c" in T.
S("ddd") reports 3 (= k - 0), so there are 0 "d"s in T.
...
S("zzz") reports 3 (= k - 0), so there are 0 "z"s in T.
Now we know that T is composed of 1 "a," 1 "b," and 1 "c," although we don't know the order yet. But now our search space is dramatically constrained: instead of invoking S on up to |A|^k = 26^3 = 17,576 guesses, we merely need to invoke it on up to k! = 3 * 2 * 1 = 6 guesses, reflecting all possible orderings of the three letters now known to be in T. (Since we also needed to call S once for each symbol in A, the solution really requires k! + |A| invocations of S, but since |A| is normally so much smaller than k! for interesting problems, let's stick with just saying k!. We can also do somewhat better when some letters are used more than once, but let's be pessimistic and ignore that possibility from now on.)
In general, the new solution is worse than the naive approach of trying all possible guesses, since |A|^k grows more slowly than k! as k increases (the proof is left as an exercise for the reader). In other words, the naive solution is better when the strings grow to be long, where the definition of "long" depends partly on |A|. Even so, for target strings we might have the patience to tackle in real life, the new solution's k! is usually smaller than the naive solution's |A|^k.
In the default example supplied with the Weasel applet, trying to guess "Methinks it is like a weasel" (k = 28, |A| = 68) requires up to k! = 28! = 304,888,344,611,713,860,501,504,000,000, or about 304 billion billion billion guesses. While this is a huge number of guesses, it's still much less work than the naive solution requires -- it's 6,700,523,481,248,868,330,560, or about 6700 billion billion, times faster than the naive solution.
However, it should be abundantly clear that even this improvement can't compare to the mere 60,000 guesses (or so) typically required by the genetic solver implemented in the Weasel applet.
In the case of the Weasel applet, alert reader Seth Carbon observed that this particular problem has an analytic solution that is even better than Nature's own way, and much better than the analytic solutions I thought of. Quoting from his email:
Let G be our string and k = |G|, all other variables and functions
are as they appeared on the webpage. After |A| iterations the function S
discovers that there are R different letters in G. Each of these R
letters appear NR times (i.e. N1+N2+...+N(R-1)+NR = k). The total number
of solutions would then be (k!/N1!*N2!*...*NR!) = Z. In degenerate cases,
Z is at least as small as k!. In more reasonable cases, the difference can
be orders of magnitude.
For example, in your case where G = "abc" and k = 3, there are
(3!/1!*1!*1!) = (6/1) = 6 different cases. If G = "aaabbc" and k = 6,
there are (6!/3!*2!*1!) = (6!/6*4) = 30 solutions, instead of k! = 720. If
G = "aaabbbccc" and k = 9, there are (9!/3!*3!*3!) = 1680 solutions,
instead of k! = 362880. Note that the size of the space containing all 9
letter strings containing just a, b, and c is 19683. Variations in
N1,...,NR can make a lot of difference in what kind of numbers you get.
Now for a quick solution [...]. The algorithm works like
this: first go through |A| comparisons to determine the number of each
letter. Generate a string that is k long of the highest scoring
letter. Now, take the next highest scoring letter and slide it across
our string taking note of when it gets farther away from our target
string (that means that the highest scoring goes there) or closer
(that is where the current letter that we a sliding belongs). Once it
has been determined where a letter is, make sure that that spot is not
checked again. Repeat the last two sentences until we have no more
letters. The numbers of comparisons needed is |A| + O(k^2) in the
worst case. Much faster--using an alphabet of 94 characters [...], it
takes only 259 guesses to find the string "Methinks it is like a
weasel". I wonder what hybrid solutions would produce?
Of course, we should not let all this fascinating math distract us from the point of the demonstration, that evolutionary techniques can solve such problems much faster than you might think, even when they are not the only solution or even necessarily the best!
s-max@pacbell.net