Sunday, July 26, 2009

I kinda get Python. Kinda.

Phew, long week and long weekend. I had my weekend WTF moment when I caved to my daughter's pleas for tonic water. I figured that there was no way a 19-month year old would have more than one sip. She proved me wrong, downing all that I gave her. There's nothing quite like watching a toddler wander around with a rock glass full of tonic. That should be on a Hallmark card or something.

Anyway, I've been slllooowwwwllly progressing with Python and MatPlotLib. There are some concepts that are still beyond me.

I've continued to play with the Weasel program; it's interesting enough to keep me programming while giving me plenty of output to visualise. I've always wanted to see how changing the number of offspring per generation AND the mutation rate would affect the number of iterations needed to finish. This requires a lot of runs- not only do you need to simulate many mutation-rate & number-of-offspring combinations, you also need to perform many simulations for each combination to get an average number of total iterations (because it's random, the results can differ quite a bit.)

This plot shows the variability in similar runs (and shows that I can do whisker plots now.) The Y-axis is number of iterations (LOG SCALE!) to completion, the x-axis is number of offspring set for the simulation. Each point represents the average of 100 simulations using the same inputs; the whiskers show the standard deviation. As the number of offspring increases, both the standard deviation and number of iterations decreases. This makes sense- with more offspring, you have a greater chance of a "good" mutation. The log scale makes the change look small, but it's order magnitudes!

I repeated this, but now I varied both mutation rate AND number of offspring. With 10,721 combinations, I lowered the number of runs per combination to 30 from 100. This left me with 321,630 weasel runs (two days of calculations!) This contour plot shows the variation in average number of iterations to completion when the two controls are changed. Note again a log scale, and also note that I had a very, very hard time setting up a proper log scale color bar. The bar goes from the low 10's to the high 400's.

What is interesting here is that if you follow the contours closely, you see a rebound in the number of iterations required as the mutation rate goes up high enough. This means that for a given number of offspring, there is an optimal mutation rate for the fastest convergence: too low and the process drags on forever, too high and we cannot converge because there is so much variation. Although I expected this result, it was still neat to see visualised like this.

The big personal accomplishment here was building these results - from the weasel runs, the scripting, all the way to the plotting - in Python. I would like to start using this language for work, but I don't want to waste research time learning an extra language.

No comments:

Post a Comment