<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6424155829455583334</id><updated>2011-11-28T02:24:23.387+02:00</updated><category term='Clojure'/><category term='Common Lisp'/><title type='text'>The beauty of grey</title><subtitle type='html'>The world is not black and white.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://t-b-o-g.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6424155829455583334/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://t-b-o-g.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Jānis Džeriņš</name><uri>http://www.blogger.com/profile/06846740575608764708</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>4</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6424155829455583334.post-6521927392701820525</id><published>2009-12-07T13:49:00.001+02:00</published><updated>2010-02-02T16:06:22.496+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Common Lisp'/><title type='text'>Brian's brain, on Common Lisp, take 3</title><content type='html'>&lt;div id="content"&gt;
&lt;p&gt;
This is the last installment about the straight-forward implementation of Brian's Brain in which I'm
going to try to squeeze some more performance out of the brute-force implementation (you should get
familiar with &lt;a href="http://t-b-o-g.blogspot.com/2009/10/brians-brain-on-common-lisp.html"&gt;first&lt;/a&gt; and &lt;a href="http://t-b-o-g.blogspot.com/2009/10/brians-brain-on-common-lisp-take-2.html"&gt;second&lt;/a&gt; post to get the full picture).  The point of the article is not about
how fast I can get it, but rather about what performance defference can be achieved using "simple"
performance tweaking measures.
&lt;/p&gt;
&lt;p&gt;
As any good programmer will tell you – optimizing without profiling first is a waste of time.  To
profile my code I will use SBCL's statistical profiler.  It is called statistical because of the way
it works: interrupting the running program at regular intervals and looking what code is running at
that moment.
&lt;/p&gt;
&lt;p&gt;
To get the idea of where the time is spent in my program I'll do a single simulation from our
benchmark (with an increased iteration count to get &lt;i&gt;more correct statistics&lt;/i&gt;):
&lt;/p&gt;



&lt;pre class="example"&gt;&lt;span class="linenr"&gt; 1:  &lt;/span&gt;CL-USER&amp;gt; (sb-sprof:with-profiling (:report :flat :loop nil)
&lt;span class="linenr"&gt; 2:  &lt;/span&gt;           (simulate 4096 (make-initialised-brain 128 128) 128 128))
&lt;span class="linenr"&gt; 3:  &lt;/span&gt;
&lt;span class="linenr"&gt; 4:  &lt;/span&gt;Number of samples:   4426
&lt;span class="linenr"&gt; 5:  &lt;/span&gt;Sample interval:     0.01 seconds
&lt;span class="linenr"&gt; 6:  &lt;/span&gt;Total sampling time: 44.26 seconds
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;Number of cycles:    0
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;Sampled threads:
&lt;span class="linenr"&gt; 9:  &lt;/span&gt; #&amp;lt;SB-THREAD:THREAD "initial thread" RUNNING {10029559D1}&amp;gt;
&lt;span class="linenr"&gt;10:  &lt;/span&gt;
&lt;span class="linenr"&gt;11:  &lt;/span&gt;           Self        Total        Cumul
&lt;span class="linenr"&gt;12:  &lt;/span&gt;  Nr  Count     %  Count     %  Count     %    Calls  Function
&lt;span class="linenr"&gt;13:  &lt;/span&gt;------------------------------------------------------------------------
&lt;span class="linenr"&gt;14:  &lt;/span&gt;   1   1401  31.7   1401  31.7   1401  31.7        -  "foreign function sigprocmask"
&lt;span class="linenr"&gt;15:  &lt;/span&gt;   2   1000  22.6   1737  39.2   2401  54.2        -  NEIGHBOURS
&lt;span class="linenr"&gt;16:  &lt;/span&gt;   3    552  12.5   1056  23.9   2953  66.7        -  COUNT
&lt;span class="linenr"&gt;17:  &lt;/span&gt;   4    200   4.5    200   4.5   3153  71.2        -  SB-IMPL::OPTIMIZED-DATA-VECTOR-REF
&lt;span class="linenr"&gt;18:  &lt;/span&gt;   5    179   4.0    179   4.0   3332  75.3        -  SB-KERNEL:HAIRY-DATA-VECTOR-REF/CHECK-BOUNDS
&lt;span class="linenr"&gt;19:  &lt;/span&gt;   6    173   3.9    173   3.9   3505  79.2        -  SB-VM::GENERIC-+
&lt;span class="linenr"&gt;20:  &lt;/span&gt;   7    145   3.3    247   5.6   3650  82.5        -  (LAMBDA (SB-IMPL::X))
&lt;span class="linenr"&gt;21:  &lt;/span&gt;   8    143   3.2    143   3.2   3793  85.7        -  TRUNCATE
&lt;span class="linenr"&gt;22:  &lt;/span&gt;   9    128   2.9    128   2.9   3921  88.6        -  SB-KERNEL:%COERCE-CALLABLE-TO-FUN
&lt;span class="linenr"&gt;23:  &lt;/span&gt;  10    116   2.6    139   3.1   4037  91.2        -  LENGTH
&lt;span class="linenr"&gt;24:  &lt;/span&gt;  11     80   1.8     80   1.8   4117  93.0        -  EQL
&lt;span class="linenr"&gt;25:  &lt;/span&gt;  12     67   1.5   2970  67.1   4184  94.5        -  EVOLVE
&lt;span class="linenr"&gt;26:  &lt;/span&gt;  13     51   1.2     51   1.2   4235  95.7        -  (FLET LT)
&lt;span class="linenr"&gt;27:  &lt;/span&gt;  14     42   0.9     42   0.9   4277  96.6        -  (FLET RT)
&lt;span class="linenr"&gt;28:  &lt;/span&gt;  15     41   0.9     41   0.9   4318  97.6        -  SB-KERNEL:SEQUENCEP
&lt;span class="linenr"&gt;29:  &lt;/span&gt;  16     30   0.7     34   0.8   4348  98.2        -  NTHCDR
&lt;span class="linenr"&gt;30:  &lt;/span&gt;  17     27   0.6   1076  24.3   4375  98.8        -  RULES
&lt;span class="linenr"&gt;31:  &lt;/span&gt;  18     22   0.5     22   0.5   4397  99.3        -  (FLET #:CLEANUP-FUN-[CALL-WITH-SYSTEM-MUTEX/WITHOUT-GCING]146)
&lt;span class="linenr"&gt;32:  &lt;/span&gt;  19     17   0.4     17   0.4   4414  99.7        -  "foreign function mach_msg_server"
&lt;span class="linenr"&gt;33:  &lt;/span&gt;  20     11   0.2     11   0.2   4425 100.0        -  (FLET SB-IMPL::FAST-NTHCDR)
&lt;span class="linenr"&gt;34:  &lt;/span&gt;  21      1  0.02      1  0.02   4426 100.0        -  (FLET #:CLEANUP-FUN-[INVOKE-INTERRUPTION]41)
&lt;span class="linenr"&gt;35:  &lt;/span&gt;  22      0   0.0   2973  67.2   4426 100.0        -  SIMULATE
&lt;span class="linenr"&gt;36:  &lt;/span&gt;...
&lt;/pre&gt;




&lt;p&gt;
Ignoring the first entry related to &lt;code&gt;sigprocmask&lt;/code&gt; (because that's sure some implementation detail
related to GC, profiling itself, or something else I should not bother with) we can see our top
candidates for optimization:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;neighbours&lt;/code&gt; – as expected, this is the function called most frequently (for every cell at every
simulation iteration).
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;count&lt;/code&gt; – standard Common Lisp function (called with the results of &lt;code&gt;neighbours&lt;/code&gt; to count the
number of &lt;i&gt;on&lt;/i&gt; cells).
&lt;/li&gt;
&lt;li&gt;
Some implementation-specific functions dealing with vectors.  One of these functions contains a
word "hairy" which I think means that SBCL could use some hints.  Declaring the &lt;code&gt;cells&lt;/code&gt; array
type should hopefully get rid of these.  Also should help with calls to &lt;code&gt;length&lt;/code&gt; seen a few rows
below.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sb-vm::generic-+&lt;/code&gt; – most probably from &lt;code&gt;neighbours&lt;/code&gt; function.  The name implies that the generic
version of &lt;code&gt;+&lt;/code&gt; function (addition) is used, instead of using &lt;code&gt;fixnum&lt;/code&gt; specific one.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;eql&lt;/code&gt; – might be from call to &lt;code&gt;count&lt;/code&gt; in &lt;code&gt;rules&lt;/code&gt; function, being the default &lt;code&gt;:test&lt;/code&gt; function.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;(flet lt)&lt;/code&gt; and &lt;code&gt;(flet rt)&lt;/code&gt; – these are from within &lt;code&gt;neighbours&lt;/code&gt;.  I'm pretty sure they are
counted in one of the columns in &lt;code&gt;neighbours&lt;/code&gt; row.

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The optimization plan now is simple&lt;sup&gt;&lt;a class="footref" name="fnr.1" href="#fn.1"&gt;1&lt;/a&gt;&lt;/sup&gt;:
&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
Declare the type of &lt;code&gt;cells&lt;/code&gt; array.
&lt;/li&gt;
&lt;li&gt;
Optimize &lt;code&gt;neighbours&lt;/code&gt; function.
&lt;/li&gt;
&lt;li&gt;
Sprinkle some more declarations as needed.

&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first step is easy – all I have to do is add the following declaration to uses of &lt;code&gt;cells&lt;/code&gt;
array, which tells the compiler that the array is one-dimensional (undefined length) and its
elements are of type &lt;code&gt;(integer 0 2)&lt;/code&gt; (integers from &lt;code&gt;0&lt;/code&gt; to &lt;code&gt;2&lt;/code&gt;):
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;declare&lt;/span&gt; (type (simple-array (integer 0 2) (*)) cells))
&lt;/pre&gt;




&lt;p&gt;
OK, might as well make it a bit more generic (although just for the sake of example; the program is
too short to make this worthwhile).  Two types are introduced as "aliases" (that is, upon seeing one
of these the compiler will expand them to corresponding types in the definition; a lot more
complicated type declarations are &lt;a href="http://www.lispworks.com/documentation/HyperSpec/Body/m_deftp.htm"&gt;possible&lt;/a&gt;):
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;deftype&lt;/span&gt; &lt;span style="color: #0000ff; font-weight: bold;"&gt;cell&lt;/span&gt; () '(integer 0 2))
&lt;span class="linenr"&gt;2:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;deftype&lt;/span&gt; &lt;span style="color: #0000ff; font-weight: bold;"&gt;brain&lt;/span&gt; () '(simple-array cell (*)))
&lt;/pre&gt;




&lt;p&gt;
One important thing I should note here is that type declarations are optional in Common Lisp and the
compiler is allowed to ignore them altogether.  But they still provide a structured way to document
code and all implementations I'm familiar with will use the declarations, if provided.
&lt;/p&gt;
&lt;p&gt;
There are two extreme cases in type-checking:
&lt;/p&gt;&lt;ol&gt;
&lt;li&gt;
Everything is checked and declarations are ignored.
&lt;/li&gt;
&lt;li&gt;
Declarations are trusted and no type-checking is done.

&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In Common Lisp there is a way to instruct the compiler how to operate: generate &lt;a href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_s.htm#safe"&gt;safe&lt;/a&gt; (but due to
checking usually relatively slow) or &lt;a href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_u.htm#unsafe"&gt;unsafe&lt;/a&gt; (probably fast due to the absence of checks) code.
There are two settings (&lt;i&gt;optimization qualities&lt;/i&gt;) to &lt;i&gt;directlyl&lt;/i&gt; control this behavior: &lt;code&gt;speed&lt;/code&gt; and
&lt;code&gt;safety&lt;/code&gt;.  These settings can take values from &lt;code&gt;0&lt;/code&gt; to &lt;code&gt;3&lt;/code&gt; (zero being the "unimportant" value, &lt;code&gt;3&lt;/code&gt;
being "very important").  &lt;a href="http://www.lispworks.com/documentation/HyperSpec/Body/d_optimi.htm#optimize"&gt;&lt;code&gt;optimize&lt;/code&gt;&lt;/a&gt; &lt;i&gt;declaration identifier&lt;/i&gt; is used to specify these qualities,
like this:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;declare&lt;/span&gt; (optimize (speed 3) (safety 1)))
&lt;/pre&gt;




&lt;p&gt;
Specifying that speed is very important will also make some compilers (including SBCL) to be very
chatty about why it cannot optimize particular parts of the code.  Other implementations have
specific declarations to make the compiler output optimization such notes.
&lt;/p&gt;
&lt;p&gt;
Since I'll be experimenting with different optimization settings I'm going to specify them once for
the whole file.  It can be done with the following form at the beggining of the file:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;declaim&lt;/span&gt; (optimize (speed 3) (safety 1)))
&lt;/pre&gt;




&lt;p&gt;
Such optimization declarations can also be added on a per-function basis (or even for a specific
form within a function), and is actually what is being done in day-to-day programming to
optimize only the speed-critical sections of code.
&lt;/p&gt;
&lt;p&gt;
And by the way – never, never ever run code with &lt;code&gt;safety&lt;/code&gt; &lt;code&gt;0&lt;/code&gt; (unless you know what you are doing,
which practically is never).  But since this is a toy program anyway, might as well try doing it to
see what performance impact does it have.
&lt;/p&gt;
&lt;p&gt;
Here are the &lt;code&gt;neighbours&lt;/code&gt; and &lt;code&gt;evolve&lt;/code&gt; functions with added array declarations:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt; 1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;neighbours&lt;/span&gt; (cells i w h)
&lt;span class="linenr"&gt; 2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;declare&lt;/span&gt; (type brain cells))
&lt;span class="linenr"&gt; 3:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; ((l (length cells))
&lt;span class="linenr"&gt; 4:  &lt;/span&gt;        (mx (1- w))
&lt;span class="linenr"&gt; 5:  &lt;/span&gt;        (my (1- h)))
&lt;span class="linenr"&gt; 6:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;multiple-value-bind&lt;/span&gt; (y x)
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;        (truncate i w)
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;      (&lt;span style="color: #4682b4; font-weight: bold;"&gt;flet&lt;/span&gt; ((up (i) (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (zerop y) (- (+ i l) w) (- i w)))
&lt;span class="linenr"&gt; 9:  &lt;/span&gt;             (dn (i) (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (= y  my) (- (+ i w) l) (+ i w)))
&lt;span class="linenr"&gt;10:  &lt;/span&gt;             (lt (i) (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (zerop x) (1- (+ i w))  (1- i)))
&lt;span class="linenr"&gt;11:  &lt;/span&gt;             (rt (i) (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (= x  mx) (1+ (- i w))  (1+ i))))
&lt;span class="linenr"&gt;12:  &lt;/span&gt;        (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let*&lt;/span&gt; ((u (up i))
&lt;span class="linenr"&gt;13:  &lt;/span&gt;               (d (dn i))
&lt;span class="linenr"&gt;14:  &lt;/span&gt;               (l (lt i))
&lt;span class="linenr"&gt;15:  &lt;/span&gt;               (r (rt i))
&lt;span class="linenr"&gt;16:  &lt;/span&gt;               (ul (lt u))
&lt;span class="linenr"&gt;17:  &lt;/span&gt;               (ur (rt u))
&lt;span class="linenr"&gt;18:  &lt;/span&gt;               (dl (lt d))
&lt;span class="linenr"&gt;19:  &lt;/span&gt;               (dr (rt d)))
&lt;span class="linenr"&gt;20:  &lt;/span&gt;          (mapcar (&lt;span style="color: #4682b4; font-weight: bold;"&gt;lambda&lt;/span&gt; (i) (aref cells i))
&lt;span class="linenr"&gt;21:  &lt;/span&gt;                  (list ul u ur l r dl d dr)))))))
&lt;span class="linenr"&gt;22:  &lt;/span&gt;
&lt;span class="linenr"&gt;23:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;evolve&lt;/span&gt; (src w h)
&lt;span class="linenr"&gt;24:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;declare&lt;/span&gt; (type brain src))
&lt;span class="linenr"&gt;25:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let*&lt;/span&gt; ((l (length src))
&lt;span class="linenr"&gt;26:  &lt;/span&gt;         (dst (make-brain w h)))
&lt;span class="linenr"&gt;27:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;declare&lt;/span&gt; (type brain dst))
&lt;span class="linenr"&gt;28:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; for i below l
&lt;span class="linenr"&gt;29:  &lt;/span&gt;       do (setf (aref dst i)
&lt;span class="linenr"&gt;30:  &lt;/span&gt;                (rules (aref src i) (neighbours src i w h))))
&lt;span class="linenr"&gt;31:  &lt;/span&gt;    dst))
&lt;/pre&gt;




&lt;p&gt;
Compiling these produces a lot of complaints from the compiler.  Some of the notes from the
&lt;code&gt;neighbours&lt;/code&gt; function look like this:
&lt;/p&gt;



&lt;pre class="example"&gt;&lt;span class="linenr"&gt; 1:  &lt;/span&gt;; in: DEFUN NEIGHBOURS
&lt;span class="linenr"&gt; 2:  &lt;/span&gt;;     (ZEROP X)
&lt;span class="linenr"&gt; 3:  &lt;/span&gt;; ==&amp;gt;
&lt;span class="linenr"&gt; 4:  &lt;/span&gt;;   (= X 0)
&lt;span class="linenr"&gt; 5:  &lt;/span&gt;; 
&lt;span class="linenr"&gt; 6:  &lt;/span&gt;; note: unable to
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;;   open-code FLOAT to RATIONAL comparison
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;; due to type uncertainty:
&lt;span class="linenr"&gt; 9:  &lt;/span&gt;;   The first argument is a REAL, not a FLOAT.
&lt;span class="linenr"&gt;10:  &lt;/span&gt;; 
&lt;span class="linenr"&gt;11:  &lt;/span&gt;; note: unable to open code because: The operands might not be the same type.
&lt;span class="linenr"&gt;12:  &lt;/span&gt;
&lt;span class="linenr"&gt;13:  &lt;/span&gt;;     (+ I W)
&lt;span class="linenr"&gt;14:  &lt;/span&gt;; 
&lt;span class="linenr"&gt;15:  &lt;/span&gt;; note: unable to
&lt;span class="linenr"&gt;16:  &lt;/span&gt;;   optimize
&lt;span class="linenr"&gt;17:  &lt;/span&gt;; due to type uncertainty:
&lt;span class="linenr"&gt;18:  &lt;/span&gt;;   The first argument is a REAL, not a RATIONAL.
&lt;span class="linenr"&gt;19:  &lt;/span&gt;;   The second argument is a NUMBER, not a FLOAT.
&lt;span class="linenr"&gt;20:  &lt;/span&gt;; 
&lt;span class="linenr"&gt;21:  &lt;/span&gt;; note: unable to
&lt;span class="linenr"&gt;22:  &lt;/span&gt;;   optimize
&lt;span class="linenr"&gt;23:  &lt;/span&gt;; due to type uncertainty:
&lt;span class="linenr"&gt;24:  &lt;/span&gt;;   The first argument is a REAL, not a FLOAT.
&lt;span class="linenr"&gt;25:  &lt;/span&gt;;   The second argument is a NUMBER, not a RATIONAL.
&lt;span class="linenr"&gt;26:  &lt;/span&gt;; 
&lt;span class="linenr"&gt;27:  &lt;/span&gt;; note: unable to
&lt;span class="linenr"&gt;28:  &lt;/span&gt;;   optimize
&lt;span class="linenr"&gt;29:  &lt;/span&gt;; due to type uncertainty:
&lt;span class="linenr"&gt;30:  &lt;/span&gt;;   The first argument is a REAL, not a SINGLE-FLOAT.
&lt;span class="linenr"&gt;31:  &lt;/span&gt;;   The second argument is a NUMBER, not a DOUBLE-FLOAT.
&lt;span class="linenr"&gt;32:  &lt;/span&gt;; 
&lt;span class="linenr"&gt;33:  &lt;/span&gt;; note: unable to
&lt;span class="linenr"&gt;34:  &lt;/span&gt;;   optimize
&lt;span class="linenr"&gt;35:  &lt;/span&gt;; due to type uncertainty:
&lt;span class="linenr"&gt;36:  &lt;/span&gt;;   The first argument is a REAL, not a DOUBLE-FLOAT.
&lt;span class="linenr"&gt;37:  &lt;/span&gt;;   The second argument is a NUMBER, not a SINGLE-FLOAT.
&lt;span class="linenr"&gt;38:  &lt;/span&gt;...
&lt;/pre&gt;




&lt;p&gt;
Profiling the code now shows that I've got rid of the two &lt;code&gt;vector-ref&lt;/code&gt; functions.  But there is
still a bit of work to do on &lt;code&gt;neighbours&lt;/code&gt; function with all the noise compiler generates.  First
I'll declare the types of parameters (and show only the relevant forms here):
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;deftype&lt;/span&gt; &lt;span style="color: #0000ff; font-weight: bold;"&gt;array-index&lt;/span&gt; () `(integer 0 ,array-dimension-limit))
&lt;span class="linenr"&gt;2:  &lt;/span&gt;
&lt;span class="linenr"&gt;3:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;declare&lt;/span&gt; (type brain cells)
&lt;span class="linenr"&gt;4:  &lt;/span&gt;         (type array-index i)
&lt;span class="linenr"&gt;5:  &lt;/span&gt;         (type fixnum w h))
&lt;/pre&gt;




&lt;p&gt;
A note on the &lt;code&gt;array-index&lt;/code&gt; type: the actual integer range depends on the value of
&lt;a href="http://www.lispworks.com/documentation/HyperSpec/Body/v_ar_dim.htm"&gt;&lt;code&gt;array-dimension-limit&lt;/code&gt;&lt;/a&gt;, which is &lt;a href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_i.htm#implementation-dependent"&gt;implementation dependent&lt;/a&gt; (but usually is between 2&lt;sup&gt;24&lt;/sup&gt; and
&lt;a href="http://www.lispworks.com/documentation/HyperSpec/Body/v_most_p.htm#most-positive-fixnum"&gt;&lt;code&gt;most-positive-fixnum&lt;/code&gt;&lt;/a&gt;).
&lt;/p&gt;
&lt;p&gt;
Now compiling &lt;code&gt;neighbours&lt;/code&gt; function emits quite a short list of compiler notes:
&lt;/p&gt;



&lt;pre class="example"&gt;&lt;span class="linenr"&gt; 1:  &lt;/span&gt;; in: DEFUN NEIGHBOURS
&lt;span class="linenr"&gt; 2:  &lt;/span&gt;;     (LIST UL U UR L R DL D DR)
&lt;span class="linenr"&gt; 3:  &lt;/span&gt;; 
&lt;span class="linenr"&gt; 4:  &lt;/span&gt;; note: doing signed word to integer coercion (cost 20)
&lt;span class="linenr"&gt; 5:  &lt;/span&gt;; 
&lt;span class="linenr"&gt; 6:  &lt;/span&gt;; note: doing signed word to integer coercion (cost 20)
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;;     (FLET ((UP (I)
&lt;span class="linenr"&gt; 9:  &lt;/span&gt;;              (IF (ZEROP Y)
&lt;span class="linenr"&gt;10:  &lt;/span&gt;;                  (- # W)
&lt;span class="linenr"&gt;11:  &lt;/span&gt;;                  (- I W)))
&lt;span class="linenr"&gt;12:  &lt;/span&gt;;            (DN (I)
&lt;span class="linenr"&gt;13:  &lt;/span&gt;;              (IF (= Y MY)
&lt;span class="linenr"&gt;14:  &lt;/span&gt;;                  (- # L)
&lt;span class="linenr"&gt;15:  &lt;/span&gt;;                  (+ I W)))
&lt;span class="linenr"&gt;16:  &lt;/span&gt;;            (LT (I)
&lt;span class="linenr"&gt;17:  &lt;/span&gt;;              (IF (ZEROP X)
&lt;span class="linenr"&gt;18:  &lt;/span&gt;;                  (1- #)
&lt;span class="linenr"&gt;19:  &lt;/span&gt;;                  (1- I)))
&lt;span class="linenr"&gt;20:  &lt;/span&gt;;            (RT (I)
&lt;span class="linenr"&gt;21:  &lt;/span&gt;;              (IF (= X MX)
&lt;span class="linenr"&gt;22:  &lt;/span&gt;;                  (1+ #)
&lt;span class="linenr"&gt;23:  &lt;/span&gt;;                  (1+ I))))
&lt;span class="linenr"&gt;24:  &lt;/span&gt;;       (LET* ((U (UP I))
&lt;span class="linenr"&gt;25:  &lt;/span&gt;;              (D (DN I))
&lt;span class="linenr"&gt;26:  &lt;/span&gt;;              (L (LT I))
&lt;span class="linenr"&gt;27:  &lt;/span&gt;;              (R (RT I))
&lt;span class="linenr"&gt;28:  &lt;/span&gt;;              (UL (LT U))
&lt;span class="linenr"&gt;29:  &lt;/span&gt;;              (UR (RT U))
&lt;span class="linenr"&gt;30:  &lt;/span&gt;;              (DL (LT D))
&lt;span class="linenr"&gt;31:  &lt;/span&gt;;              (DR (RT D)))
&lt;span class="linenr"&gt;32:  &lt;/span&gt;;         (MAPCAR (LAMBDA (I) (AREF CELLS I)) (LIST UL U UR L R DL D DR))))
&lt;span class="linenr"&gt;33:  &lt;/span&gt;; 
&lt;span class="linenr"&gt;34:  &lt;/span&gt;; note: doing signed word to integer coercion (cost 20) to "&amp;lt;return value&amp;gt;"
&lt;span class="linenr"&gt;35:  &lt;/span&gt;; 
&lt;span class="linenr"&gt;36:  &lt;/span&gt;; note: doing signed word to integer coercion (cost 20) to "&amp;lt;return value&amp;gt;"
&lt;span class="linenr"&gt;37:  &lt;/span&gt;; 
&lt;span class="linenr"&gt;38:  &lt;/span&gt;; compilation unit finished
&lt;span class="linenr"&gt;39:  &lt;/span&gt;;   printed 4 notes
&lt;/pre&gt;




&lt;p&gt;
The two offending forms are &lt;code&gt;flet&lt;/code&gt; and &lt;code&gt;list&lt;/code&gt;.  One solution now would be to maybe declare types of
all the eight cells in &lt;code&gt;let*&lt;/code&gt; form.  But is there anything about this code that bothers you?  You
see, lists in Common Lisp can hold any kind of values, and making a list takes time and memory.  And
this code actually creates two lists (one with the cell indices, and the one created by &lt;code&gt;mapcar&lt;/code&gt;
function).  So telling the compiler that the things that are stuffed into those lists are in fact
numbers does not make much sense (or performance difference) because the lists are still being
created and traversed.
&lt;/p&gt;
&lt;p&gt;
So, in the name of optimization I'll go on and do what has to be done: make a single array for
neighbours calculation and update it on each call.  Here's the ugly-but-supposedly-fast version:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt; 1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;deftype&lt;/span&gt; &lt;span style="color: #0000ff; font-weight: bold;"&gt;neighbours&lt;/span&gt; () '(simple-array cell (8)))
&lt;span class="linenr"&gt; 2:  &lt;/span&gt;
&lt;span class="linenr"&gt; 3:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;neighbours&lt;/span&gt; (cells i w h)
&lt;span class="linenr"&gt; 4:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;declare&lt;/span&gt; (type brain cells)
&lt;span class="linenr"&gt; 5:  &lt;/span&gt;           (type array-index i)
&lt;span class="linenr"&gt; 6:  &lt;/span&gt;           (type fixnum w h))
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; ((result (&lt;span style="color: #4682b4; font-weight: bold;"&gt;the&lt;/span&gt; neighbours (load-time-value (make-array 8 &lt;span style="color: #cd0000;"&gt;:element-type&lt;/span&gt; 'cell))))
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;        (l (length cells))
&lt;span class="linenr"&gt; 9:  &lt;/span&gt;        (mx (1- w))
&lt;span class="linenr"&gt;10:  &lt;/span&gt;        (my (1- h)))
&lt;span class="linenr"&gt;11:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;multiple-value-bind&lt;/span&gt; (y x)
&lt;span class="linenr"&gt;12:  &lt;/span&gt;        (truncate i w)
&lt;span class="linenr"&gt;13:  &lt;/span&gt;      (&lt;span style="color: #4682b4; font-weight: bold;"&gt;flet&lt;/span&gt; ((up (i) (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (zerop y) (- (+ i l) w) (- i w)))
&lt;span class="linenr"&gt;14:  &lt;/span&gt;             (dn (i) (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (= y  my) (- (+ i w) l) (+ i w)))
&lt;span class="linenr"&gt;15:  &lt;/span&gt;             (lt (i) (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (zerop x) (1- (+ i w))  (1- i)))
&lt;span class="linenr"&gt;16:  &lt;/span&gt;             (rt (i) (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (= x  mx) (1+ (- i w))  (1+ i))))
&lt;span class="linenr"&gt;17:  &lt;/span&gt;        (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let*&lt;/span&gt; ((u (up i))
&lt;span class="linenr"&gt;18:  &lt;/span&gt;               (d (dn i))
&lt;span class="linenr"&gt;19:  &lt;/span&gt;               (l (lt i))
&lt;span class="linenr"&gt;20:  &lt;/span&gt;               (r (rt i))
&lt;span class="linenr"&gt;21:  &lt;/span&gt;               (ul (lt u))
&lt;span class="linenr"&gt;22:  &lt;/span&gt;               (ur (rt u))
&lt;span class="linenr"&gt;23:  &lt;/span&gt;               (dl (lt d))
&lt;span class="linenr"&gt;24:  &lt;/span&gt;               (dr (rt d)))
&lt;span class="linenr"&gt;25:  &lt;/span&gt;          (setf (aref result 0) (aref cells ul)
&lt;span class="linenr"&gt;26:  &lt;/span&gt;                (aref result 1) (aref cells u)
&lt;span class="linenr"&gt;27:  &lt;/span&gt;                (aref result 2) (aref cells ur)
&lt;span class="linenr"&gt;28:  &lt;/span&gt;                (aref result 3) (aref cells l)
&lt;span class="linenr"&gt;29:  &lt;/span&gt;                (aref result 4) (aref cells r)
&lt;span class="linenr"&gt;30:  &lt;/span&gt;                (aref result 5) (aref cells dl)
&lt;span class="linenr"&gt;31:  &lt;/span&gt;                (aref result 6) (aref cells d)
&lt;span class="linenr"&gt;32:  &lt;/span&gt;                (aref result 7) (aref cells dr))
&lt;span class="linenr"&gt;33:  &lt;/span&gt;          result)))))
&lt;/pre&gt;




&lt;p&gt;
First I define a new type that describes the &lt;i&gt;neighbours&lt;/i&gt; array (known to be 8 elements long at
compile time).  In the function itself a new binding is introduced: &lt;code&gt;result&lt;/code&gt;.  There are two
interesting things about this binding:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
The value is created only once – when the code is loaded (in this case the behaviour is similar
to &lt;i&gt;static&lt;/i&gt; variables in C and some other languages).
&lt;/li&gt;
&lt;li&gt;
The type of array is declared (using &lt;code&gt;the&lt;/code&gt; form) because the value is only known at &lt;i&gt;load time&lt;/i&gt;
(as opposed to &lt;i&gt;compile time&lt;/i&gt;).  Theoretically I'd expect the compiler to figure this out by
itself, but generally any code can be given to &lt;code&gt;load-time-value&lt;/code&gt; so the compiler apparently is
not even trying to figure anything out.

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So there, compiling &lt;code&gt;neighbours&lt;/code&gt; function now gives no warnings and is as ugly as if it was written
in C.  And since this function now returns an array, this information can now be added to &lt;code&gt;rules&lt;/code&gt;
function:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;rules&lt;/span&gt; (state neighbours)
&lt;span class="linenr"&gt;2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;declare&lt;/span&gt; (type cell state)
&lt;span class="linenr"&gt;3:  &lt;/span&gt;           (type neighbours neighbours))
&lt;span class="linenr"&gt;4:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;case&lt;/span&gt; state
&lt;span class="linenr"&gt;5:  &lt;/span&gt;    (1 2)
&lt;span class="linenr"&gt;6:  &lt;/span&gt;    (2 0)
&lt;span class="linenr"&gt;7:  &lt;/span&gt;    (t (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (= 2 (count 1 neighbours)) 1 0))))
&lt;/pre&gt;




&lt;p&gt;
Benchmarking now gives numbers around 11 seconds for each simulation run (i.e., the code runs almost
two times faster than before).  The top of the profiler output now looks like this:
&lt;/p&gt;



&lt;pre class="example"&gt;&lt;span class="linenr"&gt; 1:  &lt;/span&gt;           Self        Total        Cumul
&lt;span class="linenr"&gt; 2:  &lt;/span&gt;  Nr  Count     %  Count     %  Count     %    Calls  Function
&lt;span class="linenr"&gt; 3:  &lt;/span&gt;------------------------------------------------------------------------
&lt;span class="linenr"&gt; 4:  &lt;/span&gt;   1    466  24.7    527  27.9    466  24.7        -  NEIGHBOURS
&lt;span class="linenr"&gt; 5:  &lt;/span&gt;   2    376  19.9   1152  61.0    842  44.6        -  COUNT
&lt;span class="linenr"&gt; 6:  &lt;/span&gt;   3    194  10.3    194  10.3   1036  54.8        -  SB-IMPL::OPTIMIZED-DATA-VECTOR-REF
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;   4    141   7.5    238  12.6   1177  62.3        -  (LAMBDA (SB-IMPL::X))
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;   5    136   7.2    136   7.2   1313  69.5        -  SB-KERNEL:HAIRY-DATA-VECTOR-REF/CHECK-BOUNDS
&lt;span class="linenr"&gt; 9:  &lt;/span&gt;   6    127   6.7    127   6.7   1440  76.2        -  SB-KERNEL:%COERCE-CALLABLE-TO-FUN
&lt;span class="linenr"&gt;10:  &lt;/span&gt;   7    110   5.8    118   6.2   1550  82.1        -  EQL
&lt;span class="linenr"&gt;11:  &lt;/span&gt;   8     86   4.6     86   4.6   1636  86.6        -  "foreign function sigprocmask"
&lt;span class="linenr"&gt;12:  &lt;/span&gt;   9     84   4.4   1800  95.3   1720  91.1        -  EVOLVE
&lt;span class="linenr"&gt;13:  &lt;/span&gt;...
&lt;/pre&gt;




&lt;p&gt;
A few notable things:
&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;
I think I'm starting to understand the second column&lt;sup&gt;&lt;a class="footref" name="fnr.2" href="#fn.2"&gt;2&lt;/a&gt;&lt;/sup&gt;.  That would explain why &lt;code&gt;evolve&lt;/code&gt; has 95%
and other entries less.  If so, then &lt;code&gt;count&lt;/code&gt; takes about twice the time of &lt;code&gt;neighbours&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
The &lt;code&gt;vector-ref&lt;/code&gt; friends are back!  Since &lt;code&gt;neighbours&lt;/code&gt; only sets array values, it sure looks like
&lt;code&gt;count&lt;/code&gt; has dragged these into the picture.

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anyway, I have not done much lisp code profiling (and performance tuning for that matter) so I'll do
a little experiment here: write my own function to count live neighbour count:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;rules&lt;/span&gt; (state neighbours)
&lt;span class="linenr"&gt;2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;declare&lt;/span&gt; (type cell state)
&lt;span class="linenr"&gt;3:  &lt;/span&gt;           (type neighbours neighbours))
&lt;span class="linenr"&gt;4:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;case&lt;/span&gt; state
&lt;span class="linenr"&gt;5:  &lt;/span&gt;    (1 2)
&lt;span class="linenr"&gt;6:  &lt;/span&gt;    (2 0)
&lt;span class="linenr"&gt;7:  &lt;/span&gt;    (t (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (= 2 (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; for x across neighbours counting (= 1 x))) 1 0))))
&lt;/pre&gt;




&lt;p&gt;
There is one compiler note for this loop – the count accumulation variable is checked for fixnum
overflow.  I'll leave it at that, wrap things up and call this my final version.
&lt;/p&gt;

&lt;p&gt;
Here are the timings (with &lt;code&gt;safety&lt;/code&gt; optimization setting of &lt;code&gt;1&lt;/code&gt; and &lt;code&gt;0&lt;/code&gt;):
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/4-bb-cl-2/bm-sbcl.png"  alt="http://www.ltn.lv/~jonis/blog/4-bb-cl-2/bm-sbcl.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/4-bb-cl-2/bm-ccl32.png"  alt="http://www.ltn.lv/~jonis/blog/4-bb-cl-2/bm-ccl32.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/4-bb-cl-2/bm-ccl64.png"  alt="http://www.ltn.lv/~jonis/blog/4-bb-cl-2/bm-ccl64.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
To be honest, I did not expect getting rid of &lt;code&gt;count&lt;/code&gt; to make such a big difference.  I may only
hope that this example will let implementation developers make the compiler better!
&lt;/p&gt;
&lt;p&gt;
The detailed timings can be seen in transcripts (below), but generally they are:
&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;
3.9s / 3s for SBCL.
&lt;/li&gt;
&lt;li&gt;
8.5s / 6.9s for Clozure CL (32-bit).
&lt;/li&gt;
&lt;li&gt;
6s / 5.5s for Clozure CL (64-bit).

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm pretty sure it is possible to squeeze out some more performance out of this.  I actually think
that this kind of low-level optimisation is an art in itself, and I'm not very good at it.  That is
why I will leave this topic for now.  But readers are welcome to contribute using comments or better
yet – writing about it in their own blogs.
&lt;/p&gt;

&lt;div id="outline-container-1" class="outline-2"&gt;
&lt;h2 id="sec-1"&gt;Data files &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;


&lt;p&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/4-bb-cl-2/simple.lisp"&gt;simple.lisp&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/4-bb-cl-2/bm-decl-sbcl-safe.txt"&gt;bm-decl-sbcl-safe.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/4-bb-cl-2/bm-decl-sbcl-unsafe.txt"&gt;bm-decl-sbcl-unsafe.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/4-bb-cl-2/bm-decl-ccl32-safe.txt"&gt;bm-decl-ccl32-safe.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/4-bb-cl-2/bm-decl-ccl32-unsafe.txt"&gt;bm-decl-ccl32-unsafe.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/4-bb-cl-2/bm-decl-ccl64-safe.txt"&gt;bm-decl-ccl64-safe.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/4-bb-cl-2/bm-decl-ccl64-unsafe.txt"&gt;bm-decl-ccl64-unsafe.txt&lt;/a&gt;
&lt;/p&gt;




&lt;/div&gt;
&lt;/div&gt;
&lt;div id="footnotes"&gt;
&lt;h2 class="footnotes"&gt;Footnotes &lt;/h2&gt;
&lt;div id="text-footnotes"&gt;
&lt;p class="footnote"&gt;&lt;sup&gt;&lt;a class="footnum" name="fn.1" href="#fnr.1"&gt;1&lt;/a&gt;&lt;/sup&gt; A &lt;a href="http://paste.lisp.org/display/91435"&gt;comment&lt;/a&gt; from pkhuong on &lt;a href="http://www.cliki.net/IRC"&gt;#lisp IRC&lt;/a&gt; channel on approaching optimization (after reading a draft
version of this article):&lt;br/&gt;
&lt;br/&gt;
"Only NEIGHBOURS shows up fairly high in the list. However, before working on it directly, I'd attack
what I consider noise: &lt;code&gt;COUNT&lt;/code&gt;, &lt;code&gt;TRUNCATE&lt;/code&gt;, &lt;code&gt;LENGTH&lt;/code&gt; and &lt;code&gt;EQL&lt;/code&gt; shouldn't be called out of line.
&lt;code&gt;*-DATA-VECTOR-REF&lt;/code&gt; and &lt;code&gt;GENERIC-+&lt;/code&gt; shouldn't have to be called either.  Inserting type declarations and
asking for speed optimization where appropriate in order to eliminate these entries should be the
first step. As it is, the profile is mostly telling you that you're doing it wrong if you really
want execution speed.&lt;br/&gt;
&lt;br/&gt;
Only then, with the noise eliminated, would it be time to work on the program itself, from a new
profile (with the declarations and with speed optimization where needed) that may be radically
different from the one above."
&lt;/p&gt;
&lt;p class="footnote"&gt;&lt;sup&gt;&lt;a class="footnum" name="fn.2" href="#fnr.2"&gt;2&lt;/a&gt;&lt;/sup&gt; Another &lt;a href="http://paste.lisp.org/display/91435C#1"&gt;comment&lt;/a&gt; from pkhuong, this time about the columns in profiler output (confirming my
understanding of the "total" column):&lt;br/&gt;
&lt;br/&gt;
"Count" columns count samples, "%" % of time (samples). Self is the amount/% of samples with the
function at the right at the top of the call stack (directly executing), total is the amount/% of
time with the function somewhere on the call stack (up to a certain depth).  Cumul is the sum of
Self values; I find it helpful to guesstimate where the repartition of self times tapers off into a
long tail.&lt;br/&gt;
&lt;br/&gt;
The cycle output (not shown here nor in the blog post) is extremely useful if you know that
a lot of time is spent in a function, either directly or in total, but now what the callers
or the callees are. You'll be able to find the function you're looking for, the most common
(in terms of sample) callers and callee. With that information, you could decide how to make
a certain callee faster (or special-case it for the slow function), change the way the function
is used in the slow callers, special-case the function for the slow callers, etc. The cycle
output is somewhat harder to read, but the format is the same as that by gprof, so you can
look around for more resources.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6424155829455583334-6521927392701820525?l=t-b-o-g.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://t-b-o-g.blogspot.com/feeds/6521927392701820525/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://t-b-o-g.blogspot.com/2009/12/brians-brain-on-common-lisp-take-3.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6424155829455583334/posts/default/6521927392701820525'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6424155829455583334/posts/default/6521927392701820525'/><link rel='alternate' type='text/html' href='http://t-b-o-g.blogspot.com/2009/12/brians-brain-on-common-lisp-take-3.html' title='Brian&apos;s brain, on Common Lisp, take 3'/><author><name>Jānis Džeriņš</name><uri>http://www.blogger.com/profile/06846740575608764708</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6424155829455583334.post-4482646819289895248</id><published>2009-10-29T15:16:00.004+02:00</published><updated>2009-11-01T21:05:00.177+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Common Lisp'/><title type='text'>Brian's brain, on Common Lisp, take 2</title><content type='html'>&lt;div id="content"&gt;
&lt;p&gt;While writing code for the &lt;a href="http://t-b-o-g.blogspot.com/2009/10/brians-brain-on-common-lisp.html"&gt;first version&lt;/a&gt; of &lt;a href="http://en.wikipedia.org/wiki/Brian%27s_Brain"&gt;Brian's Brain&lt;/a&gt; some things crept into my mind, but I left
them out of the article to keep it focused.  The main issues are:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
Using symbols to represent cells is not very efficient since a symbol is a kind of object, which
means it is a pointer, which means it takes 32 or 64 bits of storage.  2 bits would be enough in
our case.
&lt;/li&gt;
&lt;li&gt;
Computer memory is linear (well, from a programmer's point of view), and 2-dimensional arrays are
figments of our imagination.  I mean, in the end it's still a 1-dimensional array, with some
support from the programming language to make it appear as something else.  Is it worth the
trouble to implement the 2D world with a 1D array if the language (Common Lisp in this case)
provides us with 2D arrays?
&lt;/li&gt;
&lt;li&gt;
Currently a new array is allocated for each simulation step, which is not strictly necessary.  Each
simulation step needs only two arrays – for current world state and the next one.  Since all
cells of the new state are overwritten on each step, we don't need to create a completely new
array to store them – we can reuse the one from previous step.

&lt;/li&gt;
&lt;/ul&gt;

&lt;div id="outline-container-1" class="outline-2"&gt;
&lt;h2 id="sec-1"&gt;Specialised arrays &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;


&lt;p&gt;
I'll start with changing the cell representation – instead of symbols I'll use numbers.  And the
world is going to be a suitable array.  This is also a good time to tell a bit about arrays and
their types to those unfamiliar with Common Lisp.
&lt;/p&gt;
&lt;p&gt;
The arrays I used for the first implementation of Brian's Brain were of type &lt;code&gt;T&lt;/code&gt;, which is the most
generic array type possible, and it can store &lt;i&gt;any&lt;/i&gt; kind of objects.  It is also possible to specify
the type of objects that will be stored in an array when creating it.  And Common Lisp
implementation will choose the most specific type of array that can store objects of specified type.
&lt;/p&gt;
&lt;p&gt;
In my case I want to use array store numbers from 0 to 2.  There is an &lt;a href="http://www.lispworks.com/documentation/HyperSpec/Body/t_intege.htm"&gt;&lt;code&gt;integer&lt;/code&gt;&lt;/a&gt; type in Common Lisp
which is the type of &lt;i&gt;mathematical integer&lt;/i&gt; (with unlimited magnitude).  It is also possible to
limit the magnitude by specifying the lower and upper limit.  Here's how to create an array that is
suitable for storing integers from 0 to 2 in Common Lisp (SBCL):
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;CL-USER&amp;gt; (make-array 7 &lt;span style="color: #cd0000;"&gt;:element-type&lt;/span&gt; '(integer 0 2))
&lt;span class="linenr"&gt;2:  &lt;/span&gt;#(0 0 0 0 0 0 0)
&lt;/pre&gt;




&lt;p&gt;
We can also check what is the exact type of the object that has been created:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;3:  &lt;/span&gt;CL-USER&amp;gt; (type-of (make-array 7 &lt;span style="color: #cd0000;"&gt;:element-type&lt;/span&gt; '(integer 0 2)))
&lt;span class="linenr"&gt;4:  &lt;/span&gt;(SIMPLE-ARRAY (UNSIGNED-BYTE 2) (7))
&lt;/pre&gt;




&lt;p&gt;
As we can see, the type of the array created is &lt;code&gt;(unsigned-byte 2)&lt;/code&gt;, which means all non-negative
integers representable by 2 bits (see the documentation of &lt;a href="http://www.lispworks.com/documentation/HyperSpec/Body/t_unsgn_.htm"&gt;&lt;code&gt;unsigned-byte&lt;/code&gt;&lt;/a&gt; for more details).
The type returned is more general than what I have asked for, but as I said, it is up to Common Lisp
implementation to choose what type of array to create.  There is also a function that can tell what
type of array will be created given a type specifier.  Let's see, again on SBCL:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;5:  &lt;/span&gt;CL-USER&amp;gt; (upgraded-array-element-type '(integer 0 2))
&lt;span class="linenr"&gt;6:  &lt;/span&gt;(UNSIGNED-BYTE 2)
&lt;span class="linenr"&gt;7:  &lt;/span&gt;CL-USER&amp;gt; (upgraded-array-element-type '(integer 0 42))
&lt;span class="linenr"&gt;8:  &lt;/span&gt;(UNSIGNED-BYTE 7)
&lt;/pre&gt;




&lt;p&gt;
A different Common Lisp implementation is allowed to work differently.  Here is what 32-bit Clozure
CL tells us:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt; 9:  &lt;/span&gt;CL-USER&amp;gt; (upgraded-array-element-type '(integer 0 2))
&lt;span class="linenr"&gt;10:  &lt;/span&gt;(UNSIGNED-BYTE 8)
&lt;span class="linenr"&gt;11:  &lt;/span&gt;CL-USER&amp;gt; (upgraded-array-element-type '(integer 0 42))
&lt;span class="linenr"&gt;12:  &lt;/span&gt;(UNSIGNED-BYTE 8)
&lt;span class="linenr"&gt;13:  &lt;/span&gt;CL-USER&amp;gt; (type-of (make-array 7 &lt;span style="color: #cd0000;"&gt;:element-type&lt;/span&gt; '(integer 0 2)))
&lt;span class="linenr"&gt;14:  &lt;/span&gt;(SIMPLE-ARRAY (UNSIGNED-BYTE 8) (7))
&lt;/pre&gt;




&lt;p&gt;
As you can see, in Clozure CL 8-bit-byte is the most specific array type available for arrays of
small integers.  There are many places in Common Lisp specification where implementations are given
freedom of implementation choices.  For instance, this specific issue is described in
&lt;a href="http://www.lispworks.com/documentation/HyperSpec/Body/15_aba.htm"&gt;array upgrading&lt;/a&gt; section.
&lt;/p&gt;
&lt;p&gt;
So now we are ready to work with numbers as cell representations. 0 for dead cell, 1 for alive cell,
and 2 for dying cell:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;15:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;make-brain&lt;/span&gt; (w h)
&lt;span class="linenr"&gt;16:  &lt;/span&gt;  (make-array (list h w) &lt;span style="color: #cd0000;"&gt;:element-type&lt;/span&gt; '(integer 0 2)))
&lt;span class="linenr"&gt;17:  &lt;/span&gt;
&lt;span class="linenr"&gt;18:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;make-initialised-brain&lt;/span&gt; (w h)
&lt;span class="linenr"&gt;19:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; ((cells (make-brain w h))
&lt;span class="linenr"&gt;20:  &lt;/span&gt;        (mid (floor w 2)))
&lt;span class="linenr"&gt;21:  &lt;/span&gt;    (setf (aref cells 0 mid) 1)
&lt;span class="linenr"&gt;22:  &lt;/span&gt;    (setf (aref cells 0 (1+ mid)) 1)
&lt;span class="linenr"&gt;23:  &lt;/span&gt;    cells))
&lt;span class="linenr"&gt;24:  &lt;/span&gt;
&lt;span class="linenr"&gt;25:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;rules&lt;/span&gt; (state neighbours)
&lt;span class="linenr"&gt;26:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;case&lt;/span&gt; state
&lt;span class="linenr"&gt;27:  &lt;/span&gt;    (1 2)
&lt;span class="linenr"&gt;28:  &lt;/span&gt;    (2 0)
&lt;span class="linenr"&gt;29:  &lt;/span&gt;    (t (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (= 2 (count 1 neighbours)) 1 0))))
&lt;/pre&gt;




&lt;p&gt;
All other code stays as it was.  Now is a good time to recall the odd graph from last time (scaled
to match other graphs below):
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-symbolic-sbcl.png"  alt="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-symbolic-sbcl.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
And the new version:
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-num-sbcl.png"  alt="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-num-sbcl.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
Quite a nice improvement for such a small change.  I guess my theory about GCing large arrays was
correct. The times for Clozure CL have also improved, not quite that dramatically though:
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-num-ccl32.png"  alt="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-num-ccl32.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-num-ccl64.png"  alt="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-num-ccl64.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
What do we learn?  Lisp programmers can learn the price of some things&lt;sup&gt;&lt;a class="footref" name="fnr.1" href="#fn.1"&gt;1&lt;/a&gt;&lt;/sup&gt;.
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-2" class="outline-2"&gt;
&lt;h2 id="sec-2"&gt;Destructive action &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;


&lt;p&gt;
Next step is to avoid allocating a new array at each evolution step.  I'm not expecting any serious
performance gains from this because at this point each iteration is creating a single object that
does not have to be traversed by GC.  Let's see if I can shed some light on the believers in manual
memory management lurking in the dark corners of &lt;a href="http://www.urbandictionary.com/define.php?term=intarweb"&gt;intarwebs&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;
First I'll make &lt;code&gt;evolve&lt;/code&gt; take another parameter – the array where the next generation cells will be
put into:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;30:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;evolve&lt;/span&gt; (src dst)
&lt;span class="linenr"&gt;31:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let*&lt;/span&gt; ((w (array-dimension src 1))
&lt;span class="linenr"&gt;32:  &lt;/span&gt;         (h (array-dimension src 0)))
&lt;span class="linenr"&gt;33:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; for j below h
&lt;span class="linenr"&gt;34:  &lt;/span&gt;       do (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; for i below w
&lt;span class="linenr"&gt;35:  &lt;/span&gt;             do (setf (aref dst j i)
&lt;span class="linenr"&gt;36:  &lt;/span&gt;                      (funcall 'rules (aref src j i) (neighbours src i j)))))
&lt;span class="linenr"&gt;37:  &lt;/span&gt;    dst))
&lt;/pre&gt;




&lt;p&gt;
Easy, one parameter more, one line less.  Now &lt;code&gt;simulate&lt;/code&gt; will need two brains and will alternate
them at each call to &lt;code&gt;evolve&lt;/code&gt;:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;38:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;simulate&lt;/span&gt; (steps a b)
&lt;span class="linenr"&gt;39:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; repeat steps
&lt;span class="linenr"&gt;40:  &lt;/span&gt;     do (psetf a (evolve a b)
&lt;span class="linenr"&gt;41:  &lt;/span&gt;               b a)
&lt;span class="linenr"&gt;42:  &lt;/span&gt;     finally (&lt;span style="color: #4682b4; font-weight: bold;"&gt;return&lt;/span&gt; a)))
&lt;/pre&gt;




&lt;p&gt;
&lt;code&gt;benchmark&lt;/code&gt; also must be updated to pass brain (of proper size) to &lt;code&gt;simulate&lt;/code&gt;:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;43:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;benchmark&lt;/span&gt; ()
&lt;span class="linenr"&gt;44:  &lt;/span&gt;  (format *trace-output* &lt;span style="color: #008b00;"&gt;"Benchmarking on ~A ~A~%"&lt;/span&gt;
&lt;span class="linenr"&gt;45:  &lt;/span&gt;          (lisp-implementation-type)
&lt;span class="linenr"&gt;46:  &lt;/span&gt;          (lisp-implementation-version))
&lt;span class="linenr"&gt;47:  &lt;/span&gt;  &lt;span style="color: #ffa500;"&gt;;; &lt;/span&gt;&lt;span style="color: #ffa500;"&gt;Warmup.
&lt;span class="linenr"&gt;48:  &lt;/span&gt;&lt;/span&gt;  (simulate 10000 (make-initialised-brain 16 16))
&lt;span class="linenr"&gt;49:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt;
&lt;span class="linenr"&gt;50:  &lt;/span&gt;     for (w h i) in '((32    32  32768)
&lt;span class="linenr"&gt;51:  &lt;/span&gt;                      (64    64  8192)
&lt;span class="linenr"&gt;52:  &lt;/span&gt;                      (128  128  2048)
&lt;span class="linenr"&gt;53:  &lt;/span&gt;                      (256  256  512)
&lt;span class="linenr"&gt;54:  &lt;/span&gt;                      (512  512  128)
&lt;span class="linenr"&gt;55:  &lt;/span&gt;                      (1024 1024 32)
&lt;span class="linenr"&gt;56:  &lt;/span&gt;                      (2048 2048 8)
&lt;span class="linenr"&gt;57:  &lt;/span&gt;                      (4096 4096 2))
&lt;span class="linenr"&gt;58:  &lt;/span&gt;     do #+ccl (gc)
&lt;span class="linenr"&gt;59:  &lt;/span&gt;        #+sbcl (gc &lt;span style="color: #cd0000;"&gt;:full&lt;/span&gt; t)
&lt;span class="linenr"&gt;60:  &lt;/span&gt;        (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; ((initial (make-initialised-brain w h))
&lt;span class="linenr"&gt;61:  &lt;/span&gt;              (spare (make-brain w h)))
&lt;span class="linenr"&gt;62:  &lt;/span&gt;          (format *trace-output* &lt;span style="color: #008b00;"&gt;"*** ~Dx~D ~D iteration~:P ***~%"&lt;/span&gt; w h i)
&lt;span class="linenr"&gt;63:  &lt;/span&gt;          (time (simulate i initial spare))
&lt;span class="linenr"&gt;64:  &lt;/span&gt;          (finish-output *trace-output*)))
&lt;span class="linenr"&gt;65:  &lt;/span&gt;  (values))
&lt;/pre&gt;




&lt;p&gt;
Here, take a close look at the graphs now:
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-des-sbcl.png"  alt="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-des-sbcl.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-des-ccl32.png"  alt="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-des-ccl32.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-des-ccl64.png"  alt="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-des-ccl64.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
Notice anything interesting?  Right, the performance has not improved noticeably.  It has actually
decreased slightly for SBCL and 64-bit Clozure CL.  What's the lesson?  Garbage Collector will make
your life easier (if you don't deliberately spam it).
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-3" class="outline-2"&gt;
&lt;h2 id="sec-3"&gt;Dimension warp &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;


&lt;p&gt;
The last exercise today will be to test whether it is worth the trouble of implementing
2-dimensional arrays on top of 1-dimensional arrays manually.  This means doing the index
calculations manually.  I'll branch this experiment off the numerical branch&lt;sup&gt;&lt;a class="footref" name="fnr.2" href="#fn.2"&gt;2&lt;/a&gt;&lt;/sup&gt;.
&lt;/p&gt;
&lt;p&gt;
Let's start with the easy things – creating the arrays:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;66:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;make-brain&lt;/span&gt; (w h)
&lt;span class="linenr"&gt;67:  &lt;/span&gt;  (make-array (* w h) &lt;span style="color: #cd0000;"&gt;:element-type&lt;/span&gt; '(integer 0 2)))
&lt;/pre&gt;




&lt;p&gt;
All that has to be changed in &lt;code&gt;make-initialised-brain&lt;/code&gt; is to remove the extra dimension (because we
only change the first row, and first row in our array are first &lt;code&gt;w&lt;/code&gt; cells):
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;68:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;make-initialised-brain&lt;/span&gt; (w h)
&lt;span class="linenr"&gt;69:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; ((cells (make-brain w h))
&lt;span class="linenr"&gt;70:  &lt;/span&gt;        (mid (floor w 2)))
&lt;span class="linenr"&gt;71:  &lt;/span&gt;    (setf (aref cells mid) 1)
&lt;span class="linenr"&gt;72:  &lt;/span&gt;    (setf (aref cells (1+ mid)) 1)
&lt;span class="linenr"&gt;73:  &lt;/span&gt;    cells))
&lt;/pre&gt;




&lt;p&gt;
The main function that has to be changed is &lt;code&gt;neighbours&lt;/code&gt;.  The function is very simple, although a
bit lengthy (comparing to other &lt;i&gt;incorrect&lt;/i&gt; versions), but I like to keep the intent of the code
close to how I'd do things myself if I were pretending to be a computer:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;74:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;neighbours&lt;/span&gt; (cells i w h)
&lt;span class="linenr"&gt;75:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; ((l (length cells))
&lt;span class="linenr"&gt;76:  &lt;/span&gt;        (mx (1- w))
&lt;span class="linenr"&gt;77:  &lt;/span&gt;        (my (1- h)))
&lt;span class="linenr"&gt;78:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;multiple-value-bind&lt;/span&gt; (y x)
&lt;span class="linenr"&gt;79:  &lt;/span&gt;        (truncate i w)
&lt;span class="linenr"&gt;80:  &lt;/span&gt;      (&lt;span style="color: #4682b4; font-weight: bold;"&gt;flet&lt;/span&gt; ((up (i) (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (zerop y) (- (+ i l) w) (- i w)))
&lt;span class="linenr"&gt;81:  &lt;/span&gt;             (dn (i) (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (= y  my) (- (+ i w) l) (+ i w)))
&lt;span class="linenr"&gt;82:  &lt;/span&gt;             (lt (i) (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (zerop x) (1- (+ i w))  (1- i)))
&lt;span class="linenr"&gt;83:  &lt;/span&gt;             (rt (i) (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (= x  mx) (1+ (- i w))  (1+ i))))
&lt;span class="linenr"&gt;84:  &lt;/span&gt;        (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let*&lt;/span&gt; ((u (up i))
&lt;span class="linenr"&gt;85:  &lt;/span&gt;               (d (dn i))
&lt;span class="linenr"&gt;86:  &lt;/span&gt;               (l (lt i))
&lt;span class="linenr"&gt;87:  &lt;/span&gt;               (r (rt i))
&lt;span class="linenr"&gt;88:  &lt;/span&gt;               (ul (lt u))
&lt;span class="linenr"&gt;89:  &lt;/span&gt;               (ur (rt u))
&lt;span class="linenr"&gt;90:  &lt;/span&gt;               (dl (lt d))
&lt;span class="linenr"&gt;91:  &lt;/span&gt;               (dr (rt d)))
&lt;span class="linenr"&gt;92:  &lt;/span&gt;          (mapcar (&lt;span style="color: #4682b4; font-weight: bold;"&gt;lambda&lt;/span&gt; (i) (aref cells i))
&lt;span class="linenr"&gt;93:  &lt;/span&gt;                  (list ul u ur l r dl d dr)))))))
&lt;/pre&gt;




&lt;p&gt;
&lt;code&gt;cells&lt;/code&gt; is the 1-dimensional array that is used to simulate the 2-dimensional array. &lt;code&gt;i&lt;/code&gt; is the
index of a cell for which the neighbours should be calculated.  And since &lt;code&gt;cells&lt;/code&gt; is a 1-dimensional
array and does not have any information about what dimensions are stored in it, &lt;code&gt;w&lt;/code&gt; and &lt;code&gt;h&lt;/code&gt; specify
the dimensions of 2-dimensional array.  (If it is not clear to you why these dimensions must be
specified think about how many different 2-dimensional arrays can be stored in 24-element
1-dimensional array).
&lt;/p&gt;
&lt;p&gt;
I have written this function in a way that allows me to think about the 1D array as a 2D array.  So
from &lt;code&gt;i&lt;/code&gt; and &lt;code&gt;w&lt;/code&gt; the &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt; coordinates are calculated.  These are used to detect if the cell
under question is on the top (&lt;code&gt;0&lt;/code&gt;) row or leftmost (&lt;code&gt;0&lt;/code&gt;) column.  For bottom row and rightmost
column I have &lt;code&gt;my&lt;/code&gt; and &lt;code&gt;mx&lt;/code&gt;, respectively.
&lt;/p&gt;
&lt;p&gt;
Then come four local utility functions to determine an index that is up (&lt;code&gt;up&lt;/code&gt;), down (&lt;code&gt;dn&lt;/code&gt;), left
(&lt;code&gt;lt&lt;/code&gt;) or right (&lt;code&gt;rt&lt;/code&gt;) of a given cell.  The logic is the same as in the 2D array version.
&lt;/p&gt;
&lt;p&gt;
Next I calculate indices for up, down, left and right cells using the above mentioned functions.  The
cells on diagonals are calculated relative to these.  For instance, up-left cell can be obtained by
going up from left cell or going left from up cell.
&lt;/p&gt;
&lt;p&gt;
In the end the obtained indices are used to get cell values from &lt;code&gt;cells&lt;/code&gt; array.  And that's all
there is to it.
&lt;/p&gt;
&lt;p&gt;
Now &lt;code&gt;evolve&lt;/code&gt; also can be simplified a bit since only one dimension has to be traversed.  However,
the &lt;i&gt;brain&lt;/i&gt; dimensions must be specified explicitly since the dimensions of the brain cannot be
obtained from the 1D array.
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt; 94:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;evolve&lt;/span&gt; (src w h)
&lt;span class="linenr"&gt; 95:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let*&lt;/span&gt; ((l (length src))
&lt;span class="linenr"&gt; 96:  &lt;/span&gt;         (dst (make-brain w h)))
&lt;span class="linenr"&gt; 97:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; for i below l
&lt;span class="linenr"&gt; 98:  &lt;/span&gt;       do (setf (aref dst i)
&lt;span class="linenr"&gt; 99:  &lt;/span&gt;                (funcall 'rules (aref src i) (neighbours src i w h))))
&lt;span class="linenr"&gt;100:  &lt;/span&gt;    dst))
&lt;/pre&gt;




&lt;p&gt;
The dimensions must be dragged through &lt;code&gt;simulate&lt;/code&gt; function, too:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;101:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;simulate&lt;/span&gt; (steps initial w h)
&lt;span class="linenr"&gt;102:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; with brain = initial
&lt;span class="linenr"&gt;103:  &lt;/span&gt;     repeat steps
&lt;span class="linenr"&gt;104:  &lt;/span&gt;     do (setf brain (funcall 'evolve brain w h))
&lt;span class="linenr"&gt;105:  &lt;/span&gt;     finally (&lt;span style="color: #4682b4; font-weight: bold;"&gt;return&lt;/span&gt; brain)))
&lt;/pre&gt;




&lt;p&gt;
The code gets ever uglier.  Another way to solve this problem would be to create a container object
(like a &lt;a href="http://www.lispworks.com/documentation/HyperSpec/Body/m_defstr.htm"&gt;structure&lt;/a&gt; or &lt;a href="http://www.lispworks.com/documentation/HyperSpec/Body/m_defcla.htm"&gt;class&lt;/a&gt;) to hold the cells array and the two dimensions.  But that is really not
the point of this exercise – I actually expect the gains from this change to be so minimal (if any!)
that it is not worth the trouble at all.
&lt;/p&gt;
&lt;p&gt;
OK, last change – &lt;i&gt;brain&lt;/i&gt; dimensions must be passed to &lt;code&gt;simulate&lt;/code&gt;:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;106:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;benchmark&lt;/span&gt; ()
&lt;span class="linenr"&gt;107:  &lt;/span&gt;  (format *trace-output* &lt;span style="color: #008b00;"&gt;"Benchmarking on ~A ~A~%"&lt;/span&gt;
&lt;span class="linenr"&gt;108:  &lt;/span&gt;          (lisp-implementation-type)
&lt;span class="linenr"&gt;109:  &lt;/span&gt;          (lisp-implementation-version))
&lt;span class="linenr"&gt;110:  &lt;/span&gt;  &lt;span style="color: #ffa500;"&gt;;; &lt;/span&gt;&lt;span style="color: #ffa500;"&gt;Warmup.
&lt;span class="linenr"&gt;111:  &lt;/span&gt;&lt;/span&gt;  (simulate 10000 (make-initialised-brain 16 16) 16 16)
&lt;span class="linenr"&gt;112:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt;
&lt;span class="linenr"&gt;113:  &lt;/span&gt;     for (w h i) in '((32    32  32768)
&lt;span class="linenr"&gt;114:  &lt;/span&gt;                      (64    64  8192)
&lt;span class="linenr"&gt;115:  &lt;/span&gt;                      (128  128  2048)
&lt;span class="linenr"&gt;116:  &lt;/span&gt;                      (256  256  512)
&lt;span class="linenr"&gt;117:  &lt;/span&gt;                      (512  512  128)
&lt;span class="linenr"&gt;118:  &lt;/span&gt;                      (1024 1024 32)
&lt;span class="linenr"&gt;119:  &lt;/span&gt;                      (2048 2048 8)
&lt;span class="linenr"&gt;120:  &lt;/span&gt;                      (4096 4096 2))
&lt;span class="linenr"&gt;121:  &lt;/span&gt;     do #+ccl (gc)
&lt;span class="linenr"&gt;122:  &lt;/span&gt;        #+sbcl (gc &lt;span style="color: #cd0000;"&gt;:full&lt;/span&gt; t)
&lt;span class="linenr"&gt;123:  &lt;/span&gt;        (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; ((initial (make-initialised-brain w h)))
&lt;span class="linenr"&gt;124:  &lt;/span&gt;          (format *trace-output* &lt;span style="color: #008b00;"&gt;"*** ~Dx~D ~D iteration~:P ***~%"&lt;/span&gt; w h i)
&lt;span class="linenr"&gt;125:  &lt;/span&gt;          (time (simulate i initial w h))
&lt;span class="linenr"&gt;126:  &lt;/span&gt;          (finish-output *trace-output*)))
&lt;span class="linenr"&gt;127:  &lt;/span&gt;  (values))
&lt;/pre&gt;




&lt;p&gt;
So, let's see what do we get:
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-1d-sbcl.png"  alt="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-1d-sbcl.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-1d-ccl32.png"  alt="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-1d-ccl32.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-1d-ccl64.png"  alt="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-1d-ccl64.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
Oh man was I wrong!  Why would the difference be so big?  A profiler would come in handy, but before
that I'd like to do another experiment because I have a feeling about the source of slowness: there
are no declarations and generic array access code for 2D arrays might be more complex than for 1D
arrays.  Or maybe I should give profiler a shot?
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-4" class="outline-2"&gt;
&lt;h2 id="sec-4"&gt;Looking into future &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;


&lt;p&gt;
The problem is that performance tweaking this brute-force solution seems a bit pointless.  The point
of these articles was to show how &lt;b&gt;I&lt;/b&gt; would implement the &lt;a href="http://en.wikipedia.org/wiki/Brian%27s_Brain"&gt;Brian's Brain&lt;/a&gt; in Common Lisp (and do some
apples-to-oranges comparisons to spice things up).  A really interesting thing to try out would be
to implement &lt;a href="http://en.wikipedia.org/wiki/Hashlife"&gt;Hashlife&lt;/a&gt;, which promises an astronomic speed explosion, and see how it works with
&lt;a href="http://en.wikipedia.org/wiki/Brian%27s_Brain"&gt;Brian's Brain&lt;/a&gt;.  And I suspect that generating "6 octillion generations in 30 seconds" would not
quite work out because &lt;a href="http://en.wikipedia.org/wiki/Brian%27s_Brain"&gt;Brian's Brain&lt;/a&gt; is a lot more chaotic (I have not noticed a single &lt;a href="http://en.wikipedia.org/wiki/Oscillator_(CA)"&gt;oscillator&lt;/a&gt;
while watching my animations) – the presence of "dead" cells force a lot of movement (and
expansion).  I noticed quite a few &lt;a href="http://en.wikipedia.org/wiki/Spaceship_(CA)"&gt;spaceships&lt;/a&gt;, though.
&lt;/p&gt;
&lt;p&gt;
I also installed &lt;a href="http://golly.sourceforge.net/"&gt;Golly&lt;/a&gt; – a wonderful application to play with &lt;a href="http://en.wikipedia.org/wiki/Cellular_automaton"&gt;cellular automata&lt;/a&gt;.  It has the
&lt;a href="http://en.wikipedia.org/wiki/Hashlife"&gt;Hashlife&lt;/a&gt; algorithm implemented, and it really does wonders.  However it did not work out for &lt;a href="http://en.wikipedia.org/wiki/Brian%27s_Brain"&gt;Brian's Brain&lt;/a&gt; (starting with 2 live cells as I've been doing all this time) because:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
In 30 seconds it does not even get to generation 2000.
&lt;/li&gt;
&lt;li&gt;
The pattern is ever-expanding and I could not find any way to limit the world size (i.e., I think
the torus-world is not implemented in &lt;a href="http://golly.sourceforge.net/"&gt;Golly&lt;/a&gt;).

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I'd have to implement &lt;a href="http://en.wikipedia.org/wiki/Hashlife"&gt;Hashlife&lt;/a&gt; myself.  Except that I'm hesitant to hand out $30 for the original
&lt;a href="http://www.sciencedirect.com/science?_ob=ArticleListURL&amp;amp;_method=list&amp;amp;_ArticleListID=1069301974&amp;amp;_sort=r&amp;amp;view=c&amp;amp;_acct=C000030180&amp;amp;_version=1&amp;amp;_urlVersion=0&amp;amp;_userid=8592904&amp;amp;md5=1481d4a49637a9e8eccab8f8ba0f2371"&gt;B. Gosper's paper&lt;/a&gt;, but might as well do it in the future (since as an extra bonus the original
implementation was done on &lt;a href="http://en.wikipedia.org/wiki/Symbolics"&gt;Lisp Machine&lt;/a&gt;).
&lt;/p&gt;
&lt;p&gt;
So, I think I'll do one more iteration with the brute-force method (with profiling and
declarations).  And then we'll see what will happen.
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-5" class="outline-2"&gt;
&lt;h2 id="sec-5"&gt;Data files &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;


&lt;p&gt;
Sources:
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/3-bb-cl/numeric.lisp"&gt;numeric.lisp&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/3-bb-cl/destructive.lisp"&gt;destructive.lisp&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/3-bb-cl/1d.lisp"&gt;1d.lisp&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
Benchmarking transcripts:
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-num-sbcl.txt"&gt;bm-num-sbcl.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-num-ccl32.txt"&gt;bm-num-ccl32.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-num-ccl64.txt"&gt;bm-num-ccl64.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-des-sbcl.txt"&gt;bm-des-sbcl.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-des-ccl32.txt"&gt;bm-des-ccl32.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-des-ccl64.txt"&gt;bm-des-ccl64.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-1d-sbcl.txt"&gt;bm-1d-sbcl.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-1d-ccl32.txt"&gt;bm-1d-ccl32.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/3-bb-cl/bm-1d-ccl64.txt"&gt;bm-1d-ccl64.txt&lt;/a&gt;
&lt;/p&gt;


&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-6" class="outline-2"&gt;
&lt;h2 id="sec-6"&gt;Updates &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-6"&gt;


&lt;p&gt;
2009-11-01: Please don't send me any more copies of &lt;a href="http://www.sciencedirect.com/science?_ob=ArticleListURL&amp;amp;_method=list&amp;amp;_ArticleListID=1069301974&amp;amp;_sort=r&amp;amp;view=c&amp;amp;_acct=C000030180&amp;amp;_version=1&amp;amp;_urlVersion=0&amp;amp;_userid=8592904&amp;amp;md5=1481d4a49637a9e8eccab8f8ba0f2371"&gt;B. Gosper's paper&lt;/a&gt; :)
&lt;/p&gt;



&lt;/div&gt;
&lt;/div&gt;

&lt;div id="footnotes"&gt;
&lt;h2 class="footnotes"&gt;Footnotes &lt;/h2&gt;
&lt;div id="text-footnotes"&gt;
&lt;p class="footnote"&gt;&lt;sup&gt;&lt;a class="footnum" name="fn.1" href="#fnr.1"&gt;1&lt;/a&gt;&lt;/sup&gt; Alan Perlis, Epigrams in Programming, "55. A LISP programmer knows the value of everything, but
the cost of nothing."
&lt;/p&gt;
&lt;p class="footnote"&gt;&lt;sup&gt;&lt;a class="footnum" name="fn.2" href="#fnr.2"&gt;2&lt;/a&gt;&lt;/sup&gt; Yes, I'm using &lt;a href="http://git-scm.com/"&gt;git&lt;/a&gt;, and branching like crazy, even in this toy project.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6424155829455583334-4482646819289895248?l=t-b-o-g.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://t-b-o-g.blogspot.com/feeds/4482646819289895248/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://t-b-o-g.blogspot.com/2009/10/brians-brain-on-common-lisp-take-2.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6424155829455583334/posts/default/4482646819289895248'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6424155829455583334/posts/default/4482646819289895248'/><link rel='alternate' type='text/html' href='http://t-b-o-g.blogspot.com/2009/10/brians-brain-on-common-lisp-take-2.html' title='Brian&apos;s brain, on Common Lisp, take 2'/><author><name>Jānis Džeriņš</name><uri>http://www.blogger.com/profile/06846740575608764708</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6424155829455583334.post-2881684569615265161</id><published>2009-10-26T20:40:00.009+02:00</published><updated>2009-10-29T16:10:44.952+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Common Lisp'/><title type='text'>Brian's Brain on Common Lisp</title><content type='html'>&lt;div id="content"&gt;

&lt;div id="outline-container-1" class="outline-2"&gt;
&lt;h2 id="sec-1"&gt;The real thing &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;


&lt;p&gt;
If you are reading this you might have already read the previous entry and are familiar with the
problem domain.  And today I'm going to walk you through implementing Brian's Brain in Common Lisp.
First, the straight-forward implementation with &lt;i&gt;brain&lt;/i&gt; represented as a 2-dimensional array (that's
what it is, right?), with just enough code to get it running.  There are no functions to abstract
away the &lt;i&gt;brain&lt;/i&gt; implementation details or cell representation or anything else.
&lt;/p&gt;
&lt;p&gt;
First, a function to create a &lt;i&gt;brain&lt;/i&gt;:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;make-brain&lt;/span&gt; (w h)
&lt;span class="linenr"&gt;2:  &lt;/span&gt;  (make-array (list h w) &lt;span style="color: #cd0000;"&gt;:initial-element&lt;/span&gt; &lt;span style="color: #cd0000;"&gt;:off&lt;/span&gt;))
&lt;/pre&gt;




&lt;p&gt;
And another function that will make us an initialised brain (like the one in Clojure version):
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;make-initialised-brain&lt;/span&gt; (w h)
&lt;span class="linenr"&gt;2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; ((cells (make-brain w h))
&lt;span class="linenr"&gt;3:  &lt;/span&gt;        (mid (floor w 2)))
&lt;span class="linenr"&gt;4:  &lt;/span&gt;    (setf (aref cells 0 mid) &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt;)
&lt;span class="linenr"&gt;5:  &lt;/span&gt;    (setf (aref cells 0 (1+ mid)) &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt;)
&lt;span class="linenr"&gt;6:  &lt;/span&gt;    cells))
&lt;/pre&gt;





&lt;p&gt;
Why are the dimensions to &lt;code&gt;make-array&lt;/code&gt; passed as &lt;code&gt;(h w)&lt;/code&gt; and not &lt;code&gt;(w h)&lt;/code&gt; you might ask?  Because I
like to see that my functions work as soon as I write them.  Let's see how it works:
&lt;/p&gt;



&lt;pre class="example"&gt;CL-USER&amp;gt; (make-initialised-brain 7 5)
#2A((:OFF :OFF :OFF :ON  :ON  :OFF :OFF)
    (:OFF :OFF :OFF :OFF :OFF :OFF :OFF)
    (:OFF :OFF :OFF :OFF :OFF :OFF :OFF)
    (:OFF :OFF :OFF :OFF :OFF :OFF :OFF)
    (:OFF :OFF :OFF :OFF :OFF :OFF :OFF))
&lt;/pre&gt;




&lt;p&gt;
What would happen if we had them in the opposite order:
&lt;/p&gt;



&lt;pre class="example"&gt;CL-USER&amp;gt; (make-initialised-brain 5 7)
#2A((:OFF :OFF :ON  :ON  :OFF)
    (:OFF :OFF :OFF :OFF :OFF)
    (:OFF :OFF :OFF :OFF :OFF)
    (:OFF :OFF :OFF :OFF :OFF)
    (:OFF :OFF :OFF :OFF :OFF)
    (:OFF :OFF :OFF :OFF :OFF)
    (:OFF :OFF :OFF :OFF :OFF))
&lt;/pre&gt;




&lt;p&gt;
Incidentally, this is how Lau's version has them (if you have followed the examples carefully
enough).  There is no real difference having them either way, only a matter of convenience: if I
look at the brain in &lt;i&gt;REPL&lt;/i&gt; I want to see the same thing I'd see in animated graphical output.
&lt;/p&gt;
&lt;p&gt;
The rules are independent of &lt;i&gt;brain&lt;/i&gt; representation as long as we can provide the neighbouring cells
as a parameter:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;rules&lt;/span&gt; (state neighbours)
&lt;span class="linenr"&gt;2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;case&lt;/span&gt; state
&lt;span class="linenr"&gt;3:  &lt;/span&gt;    (&lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt;    &lt;span style="color: #cd0000;"&gt;:dying&lt;/span&gt;)
&lt;span class="linenr"&gt;4:  &lt;/span&gt;    (&lt;span style="color: #cd0000;"&gt;:dying&lt;/span&gt; &lt;span style="color: #cd0000;"&gt;:off&lt;/span&gt;)
&lt;span class="linenr"&gt;5:  &lt;/span&gt;    (t (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (= 2 (count &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt; neighbours)) &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt; &lt;span style="color: #cd0000;"&gt;:off&lt;/span&gt;))))
&lt;/pre&gt;




&lt;p&gt;
How do we find the neighbours of a given cell?  Easy, like this:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt; 1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;neighbours&lt;/span&gt; (cells x y)
&lt;span class="linenr"&gt; 2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let*&lt;/span&gt; ((mx (1- (array-dimension cells 1)))
&lt;span class="linenr"&gt; 3:  &lt;/span&gt;         (my (1- (array-dimension cells 0)))
&lt;span class="linenr"&gt; 4:  &lt;/span&gt;         (l (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (zerop x) mx (1- x)))
&lt;span class="linenr"&gt; 5:  &lt;/span&gt;         (r (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (= x mx) 0 (1+ x)))
&lt;span class="linenr"&gt; 6:  &lt;/span&gt;         (u (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (zerop y) my (1- y)))
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;         (d (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (= y my) 0 (1+ y))))
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;    (mapcar (&lt;span style="color: #4682b4; font-weight: bold;"&gt;lambda&lt;/span&gt; (x y)
&lt;span class="linenr"&gt; 9:  &lt;/span&gt;              (aref cells y x))
&lt;span class="linenr"&gt;10:  &lt;/span&gt;            (list l x r l r l x r)
&lt;span class="linenr"&gt;11:  &lt;/span&gt;            (list u u u y y d d d))))
&lt;/pre&gt;




&lt;p&gt;
What happens here should be pretty obvious, but I'll explain a bit anyway since this is a one-way
communication channel.  &lt;code&gt;mx&lt;/code&gt; and &lt;code&gt;my&lt;/code&gt; are maximal values for &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt; coordinates.  Left of the
cell (&lt;code&gt;l&lt;/code&gt;) is current &lt;code&gt;x&lt;/code&gt; coordinate minus &lt;code&gt;1&lt;/code&gt;, unless we're on the leftmost column (&lt;code&gt;0&lt;/code&gt;), in which
case we get &lt;code&gt;mx&lt;/code&gt;.  Similarly for right cell, except we look if we're on the rightmost column (&lt;code&gt;mx&lt;/code&gt;),
and wrap to &lt;code&gt;0&lt;/code&gt; if we are.  Similarly for &lt;code&gt;y&lt;/code&gt; axis.  In short, referencing a cell off the edge gets
us a cell on the opposite side.
&lt;/p&gt;
&lt;p&gt;
Then for each pair of coordinates around our cell we get the value from the &lt;code&gt;cells&lt;/code&gt; array.  These
pairs are given by two lists: one fore &lt;code&gt;x&lt;/code&gt; coordinates and one for &lt;code&gt;y&lt;/code&gt; coordinates. Function
&lt;a href="http://www.lispworks.com/documentation/HyperSpec/Body/f_mapc_.htm"&gt;&lt;code&gt;mapcar&lt;/code&gt;&lt;/a&gt; goes over both lists simultaneously and applies given function to each successive pair of
items from both lists.
&lt;/p&gt;

&lt;p&gt;
Also note how indices are passed to &lt;code&gt;aref&lt;/code&gt;, with &lt;code&gt;y&lt;/code&gt; and &lt;code&gt;x&lt;/code&gt; in unnatural positions.  This is for
the reasons explained above – the rows are first dimension, and columns the second.  But
&lt;code&gt;neighbours&lt;/code&gt; function expects them in the natural order, leaving the implementation details out of
the way.
&lt;/p&gt;
&lt;p&gt;
Let's check if our neighbours function works as expected:
&lt;/p&gt;



&lt;pre class="example"&gt;CL-USER&amp;gt; (neighbours (make-initialised-brain 7 5) 3 4)
(:OFF :OFF :OFF :OFF :OFF :OFF :ON :ON)
&lt;/pre&gt;




&lt;p&gt;
The resulting list is cell values for, respectively, left-up, up, right-up, left, right, left-down,
down and right-down cells from the specified &lt;code&gt;x&lt;/code&gt;, &lt;code&gt;y&lt;/code&gt; coordinate.
&lt;/p&gt;
&lt;p&gt;
What's left?  Evolution.  The function which will create next state of a &lt;i&gt;brain&lt;/i&gt;:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;evolve&lt;/span&gt; (src)
&lt;span class="linenr"&gt;2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let*&lt;/span&gt; ((w (array-dimension src 1))
&lt;span class="linenr"&gt;3:  &lt;/span&gt;         (h (array-dimension src 0))
&lt;span class="linenr"&gt;4:  &lt;/span&gt;         (dst (make-brain w h)))
&lt;span class="linenr"&gt;5:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; for j below h
&lt;span class="linenr"&gt;6:  &lt;/span&gt;       do (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; for i below w
&lt;span class="linenr"&gt;7:  &lt;/span&gt;             do (setf (aref dst j i)
&lt;span class="linenr"&gt;8:  &lt;/span&gt;                      (funcall 'rules (aref src j i) (neighbours src i j)))))
&lt;span class="linenr"&gt;9:  &lt;/span&gt;    dst))
&lt;/pre&gt;




&lt;p&gt;
Ordinary loop over rows and columns, setting values in newly created brain by applying the &lt;code&gt;rules&lt;/code&gt;
to a cell and its neighbours in the current brain.  We're ready to play now:
&lt;/p&gt;



&lt;pre class="example"&gt;CL-USER&amp;gt; (evolve (make-initialised-brain 7 5))
#2A((:OFF :OFF :OFF :DYING :DYING :OFF :OFF)
    (:OFF :OFF :OFF :ON    :ON    :OFF :OFF)
    (:OFF :OFF :OFF :OFF   :OFF   :OFF :OFF)
    (:OFF :OFF :OFF :OFF   :OFF   :OFF :OFF)
    (:OFF :OFF :OFF :ON    :ON    :OFF :OFF))
CL-USER&amp;gt; (evolve *)
#2A((:OFF :OFF :ON  :OFF   :OFF   :ON  :OFF)
    (:OFF :OFF :OFF :DYING :DYING :OFF :OFF)
    (:OFF :OFF :OFF :ON    :ON    :OFF :OFF)
    (:OFF :OFF :OFF :ON    :ON    :OFF :OFF)
    (:OFF :OFF :OFF :DYING :DYING :OFF :OFF))
&lt;/pre&gt;




&lt;p&gt;
Using numbers for cell values would be much better visually, but I'm staying close to the Clojure
version (for now).
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-2" class="outline-2"&gt;
&lt;h2 id="sec-2"&gt;Getting ready for blastoff &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;


&lt;p&gt;
Almost ready to do some timing.  All we need are the &lt;code&gt;simulate&lt;/code&gt; and &lt;code&gt;benchmark&lt;/code&gt; functions:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt; 1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;simulate&lt;/span&gt; (steps initial)
&lt;span class="linenr"&gt; 2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; repeat steps
&lt;span class="linenr"&gt; 3:  &lt;/span&gt;     for brain = initial then (funcall 'evolve brain)
&lt;span class="linenr"&gt; 4:  &lt;/span&gt;     finally (&lt;span style="color: #4682b4; font-weight: bold;"&gt;return&lt;/span&gt; brain)))
&lt;span class="linenr"&gt; 5:  &lt;/span&gt;
&lt;span class="linenr"&gt; 6:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;benchmark&lt;/span&gt; ()
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;  (format *trace-output* &lt;span style="color: #008b00;"&gt;"Benchmarking on ~A ~A~%"&lt;/span&gt;
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;          (lisp-implementation-type)
&lt;span class="linenr"&gt; 9:  &lt;/span&gt;          (lisp-implementation-version))
&lt;span class="linenr"&gt;10:  &lt;/span&gt;  &lt;span style="color: #ffa500;"&gt;;; &lt;/span&gt;&lt;span style="color: #ffa500;"&gt;Warmup.
&lt;span class="linenr"&gt;11:  &lt;/span&gt;&lt;/span&gt;  (simulate 10000 (make-initialised-brain 16 16))
&lt;span class="linenr"&gt;12:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt;
&lt;span class="linenr"&gt;13:  &lt;/span&gt;     for (w h i) in '((32    32  32768)
&lt;span class="linenr"&gt;14:  &lt;/span&gt;                      (64    64  8192)
&lt;span class="linenr"&gt;15:  &lt;/span&gt;                      (128  128  2048)
&lt;span class="linenr"&gt;16:  &lt;/span&gt;                      (256  256  512)
&lt;span class="linenr"&gt;17:  &lt;/span&gt;                      (512  512  128)
&lt;span class="linenr"&gt;18:  &lt;/span&gt;                      (1024 1024 32)
&lt;span class="linenr"&gt;19:  &lt;/span&gt;                      (2048 2048 8)
&lt;span class="linenr"&gt;20:  &lt;/span&gt;                      (4096 4096 2))
&lt;span class="linenr"&gt;21:  &lt;/span&gt;     do (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; ((initial (make-initialised-brain w h)))
&lt;span class="linenr"&gt;22:  &lt;/span&gt;          (format *trace-output* &lt;span style="color: #008b00;"&gt;"*** ~Dx~D ~D iteration~:P ***~%"&lt;/span&gt; w h i)
&lt;span class="linenr"&gt;23:  &lt;/span&gt;          (time (simulate i initial))
&lt;span class="linenr"&gt;24:  &lt;/span&gt;          (finish-output *trace-output*)))
&lt;span class="linenr"&gt;25:  &lt;/span&gt;  (values))
&lt;/pre&gt;




&lt;p&gt;
Notice that there is not a single type annotation in this code.  Running it on my laptop&lt;sup&gt;&lt;a class="footref" name="fnr.1" href="#fn.1"&gt;1&lt;/a&gt;&lt;/sup&gt;:
&lt;/p&gt;



&lt;pre class="example"&gt;CL-USER&amp;gt; (benchmark)
Benchmarking on SBCL 1.0.31.32
*** 32x32 32768 iterations ***
Evaluation took:
  34.782 seconds of real time
  34.064263 seconds of total run time (33.060215 user, 1.004048 system)
  [ Run times consist of 7.670 seconds GC time, and 26.395 seconds non-GC time. ]
  97.94% CPU
  96,906,498,234 processor cycles
  14,769,770,512 bytes consed

*** 64x64 8192 iterations ***
Evaluation took:
  33.512 seconds of real time
  32.776229 seconds of total run time (31.254474 user, 1.521755 system)
  [ Run times consist of 5.903 seconds GC time, and 26.874 seconds non-GC time. ]
  97.80% CPU
  93,366,482,790 processor cycles
  14,762,592,768 bytes consed
...
snipped
&lt;/pre&gt;




&lt;p&gt;
Running from the terminal would look something like this:
&lt;/p&gt;



&lt;pre class="example"&gt;$ sbcl --noinform --disable-debugger --load simple.fasl --eval "(benchmark)" --eval "(quit)"
&lt;/pre&gt;




&lt;p&gt;
This time we have more information to display in the graphs: total run time and GC time.  And just
look at the numbers:
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-sbcl.png"  alt="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-sbcl.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
Interesting.  Very interesting, indeed.  What we see here is that the time to run one iteration
(blue bars) is increasing a bit first and then declines quite sharply.  GC time (green), on the
other hand, increases quite sharply starting from 256x256 simulation.  Can you come up with an
explanation?
&lt;/p&gt;
&lt;p&gt;
I have one.  Smaller arrays are processed quite fast, so are short lived, and become garbage before
GC kicks in.  The bigger arrays are processed longer, so they are &lt;i&gt;alive&lt;/i&gt; when GC starts.  And GC
has to walk all array elements each time, since arrays are not specialised (that is, can contain
anything).  I told you using numbers would be better (for different reasons, though)!
&lt;/p&gt;
&lt;p&gt;
I also did another run of this same benchmark (transcripts available below), and the numbers were
the same up to sub-second precision.
&lt;/p&gt;
&lt;p&gt;
Let's do this same thing with a different Common Lisp implementation, which in my case will be
&lt;a href="http://www.clozure.com/clozurecl.html"&gt;Clozure CL&lt;/a&gt;.  The nice thing about this implementation that its compiler is very snappy.  Running
from shell looks like this:
&lt;/p&gt;



&lt;pre class="example"&gt;$ ccl -n -Q -l simple.dx32fsl -e "(benchmark)"
&lt;/pre&gt;




&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-ccl.png"  alt="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-ccl.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
One thing to note is that arrays of size 4096x4096 exceed the limit of array length (24-bit number)
of 32-bit Clozure CL, which, coincidentally, is just 1 short of what we need:
&lt;/p&gt;



&lt;pre class="example"&gt;CL-USER&amp;gt; array-dimension-limit
16777216
CL-USER&amp;gt; (integer-length array-dimension-limit)
25
CL-USER&amp;gt; (integer-length (1- array-dimension-limit))
24
CL-USER&amp;gt; (* 4096 4096)
16777216
&lt;/pre&gt;




&lt;p&gt;
But otherwise there is less variation in 32-bit Clozure CL.  Let's look at 64-bit version:
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-ccl64.png"  alt="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-ccl64.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
Nice, behaviour similar to SBCL, except that GC times don't grow as fast.
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-3" class="outline-2"&gt;
&lt;h2 id="sec-3"&gt;Missing things &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;


&lt;p&gt;
I hear somebody in the corner mumbling something about animation and graphics.  Oh, right.  Missed
that one.  But it so happens that the very thing that made me start playing with Brian's Brain is
that I installed &lt;a href="http://common-lisp.net/project/cl-opengl/"&gt;cl-opengl&lt;/a&gt;.  And guess what?  It worked right out of the box.  On all the Common
Lisp implementation I have on my computer.  So I had start playing with it.  And the rest, as they
say, is history.
&lt;/p&gt;
&lt;p&gt;
It works like this:
&lt;/p&gt;



&lt;pre class="example"&gt;CL-USER&amp;gt; (asdf:operate 'asdf:load-op :cl-glut-examples)
; System loading output snipped...
NIL
CL-USER&amp;gt; (cl-glut-examples:run-examples)
&lt;/pre&gt;




&lt;p&gt;
And all the examples pop up, many of them animating.  So I peek at some examples to see how to set
up a window.  Easy as a pie:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt; 1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defclass&lt;/span&gt; &lt;span style="color: #0000ff; font-weight: bold;"&gt;bb&lt;/span&gt; (glut:window)
&lt;span class="linenr"&gt; 2:  &lt;/span&gt;  ((cells &lt;span style="color: #cd0000;"&gt;:accessor&lt;/span&gt; cells-of &lt;span style="color: #cd0000;"&gt;:initarg&lt;/span&gt; &lt;span style="color: #cd0000;"&gt;:cells&lt;/span&gt;))
&lt;span class="linenr"&gt; 3:  &lt;/span&gt;  (&lt;span style="color: #cd0000;"&gt;:default-initargs&lt;/span&gt;
&lt;span class="linenr"&gt; 4:  &lt;/span&gt;   &lt;span style="color: #cd0000;"&gt;:title&lt;/span&gt; &lt;span style="color: #008b00;"&gt;"Brian's Brain in CL"&lt;/span&gt;
&lt;span class="linenr"&gt; 5:  &lt;/span&gt;   &lt;span style="color: #cd0000;"&gt;:mode&lt;/span&gt; '(&lt;span style="color: #cd0000;"&gt;:double&lt;/span&gt; &lt;span style="color: #cd0000;"&gt;:rgb&lt;/span&gt;)))
&lt;span class="linenr"&gt; 6:  &lt;/span&gt;
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defmethod&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;glut:display-window&lt;/span&gt; &lt;span style="color: #cd0000;"&gt;:before&lt;/span&gt; ((w bb))
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;  (gl:clear-color 0 0 0 0)
&lt;span class="linenr"&gt; 9:  &lt;/span&gt;  (gl:matrix-mode &lt;span style="color: #cd0000;"&gt;:projection&lt;/span&gt;)
&lt;span class="linenr"&gt;10:  &lt;/span&gt;  (gl:load-identity)
&lt;span class="linenr"&gt;11:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; ((cells (cells-of w)))
&lt;span class="linenr"&gt;12:  &lt;/span&gt;   (gl:ortho 0 (array-dimension cells 1)  0 (array-dimension cells 0) -1 1)))
&lt;/pre&gt;




&lt;p&gt;
Then we need a function to render a single cell at specified position.  Drawing squares is easy
enough:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt; 1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;render-cell&lt;/span&gt; (x y cell)
&lt;span class="linenr"&gt; 2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;flet&lt;/span&gt; ((draw-cell (x y)
&lt;span class="linenr"&gt; 3:  &lt;/span&gt;           (gl:with-pushed-matrix
&lt;span class="linenr"&gt; 4:  &lt;/span&gt;               (gl:translate x y 0)
&lt;span class="linenr"&gt; 5:  &lt;/span&gt;             (gl:with-primitive &lt;span style="color: #cd0000;"&gt;:polygon&lt;/span&gt;
&lt;span class="linenr"&gt; 6:  &lt;/span&gt;               (gl:vertex 0.1 0.1 0)
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;               (gl:vertex 0.9 0.1 0)
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;               (gl:vertex 0.9 0.9 0)
&lt;span class="linenr"&gt; 9:  &lt;/span&gt;               (gl:vertex 0.1 0.9 0)))))
&lt;span class="linenr"&gt;10:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;case&lt;/span&gt; cell
&lt;span class="linenr"&gt;11:  &lt;/span&gt;      (&lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt; (gl:color 1 1 1)
&lt;span class="linenr"&gt;12:  &lt;/span&gt;           (draw-cell x y))
&lt;span class="linenr"&gt;13:  &lt;/span&gt;      (&lt;span style="color: #cd0000;"&gt;:dying&lt;/span&gt; (gl:color 0.5 0.5 0.5)
&lt;span class="linenr"&gt;14:  &lt;/span&gt;              (draw-cell x y)))))
&lt;/pre&gt;




&lt;p&gt;
All that's left are some callbacks to draw the whole window and run the animation.  The following
two methods will do just fine:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt; 1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defmethod&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;glut:display&lt;/span&gt; ((w bb))
&lt;span class="linenr"&gt; 2:  &lt;/span&gt;  (gl:clear &lt;span style="color: #cd0000;"&gt;:color-buffer&lt;/span&gt;)
&lt;span class="linenr"&gt; 3:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let*&lt;/span&gt; ((cells (cells-of w))
&lt;span class="linenr"&gt; 4:  &lt;/span&gt;         (w (array-dimension cells 1))
&lt;span class="linenr"&gt; 5:  &lt;/span&gt;         (h (array-dimension cells 0)))
&lt;span class="linenr"&gt; 6:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; for j below h
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;       do (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; for i below w
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;             do (render-cell i j (aref cells j i)))))
&lt;span class="linenr"&gt; 9:  &lt;/span&gt;  (glut:swap-buffers))
&lt;span class="linenr"&gt;10:  &lt;/span&gt;
&lt;span class="linenr"&gt;11:  &lt;/span&gt;
&lt;span class="linenr"&gt;12:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defmethod&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;glut:idle&lt;/span&gt; ((w bb))
&lt;span class="linenr"&gt;13:  &lt;/span&gt;  (setf (cells-of w) (evolve (cells-of w)))
&lt;span class="linenr"&gt;14:  &lt;/span&gt;  (glut:post-redisplay))
&lt;/pre&gt;




&lt;p&gt;
Everything is ready now.  An animated Brian's Brain can be created like this:
&lt;/p&gt;



&lt;pre class="example"&gt;(glut:display-window (make-instance 'bb
                                    :cells (make-initialised-brain 128 128)
                                    :width 512
                                    :height 512))
&lt;/pre&gt;




&lt;p&gt;
But since I don't like to put things on the toplevel which run when just loading a file, I'll put
the code into a function:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;run&lt;/span&gt; (w h ww wh)
&lt;span class="linenr"&gt;2:  &lt;/span&gt;  (glut:display-window
&lt;span class="linenr"&gt;3:  &lt;/span&gt;   (make-instance 'bb
&lt;span class="linenr"&gt;4:  &lt;/span&gt;                  &lt;span style="color: #cd0000;"&gt;:cells&lt;/span&gt; (make-initialised-brain w h)
&lt;span class="linenr"&gt;5:  &lt;/span&gt;                  &lt;span style="color: #cd0000;"&gt;:width&lt;/span&gt; ww
&lt;span class="linenr"&gt;6:  &lt;/span&gt;                  &lt;span style="color: #cd0000;"&gt;:height&lt;/span&gt; wh)))
&lt;/pre&gt;




&lt;p&gt;
Feel free to start a never-ending Brian's Brain simulation:
&lt;/p&gt;



&lt;pre class="example"&gt;CL-USER&amp;gt; (run 160 100 320 200)
&lt;/pre&gt;




&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/2-bb-cl/bb-160x100.png"  alt="http://www.ltn.lv/~jonis/blog/2-bb-cl/bb-160x100.png" /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-4" class="outline-2"&gt;
&lt;h2 id="sec-4"&gt;Finishing touches &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;


&lt;p&gt;
To make this all easily loadable I'll put all code in its own package and create a system definition
(which nowadays means a ASDF) file.  Refer to Xach's &lt;a href="http://xach.livejournal.com/130040.html"&gt;intro&lt;/a&gt; for a nice description of why and how to
do this.
&lt;/p&gt;
&lt;p&gt;
Package definition is very simple:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defpackage&lt;/span&gt; &lt;span style="color: #0000ff; font-weight: bold;"&gt;:brians-brain-1&lt;/span&gt;
&lt;span class="linenr"&gt;2:  &lt;/span&gt;  (&lt;span style="color: #cd0000;"&gt;:use&lt;/span&gt; &lt;span style="color: #cd0000;"&gt;:common-lisp&lt;/span&gt;)
&lt;span class="linenr"&gt;3:  &lt;/span&gt;  (&lt;span style="color: #cd0000;"&gt;:export&lt;/span&gt; #&lt;span style="color: #cd0000;"&gt;:run&lt;/span&gt;))
&lt;/pre&gt;




&lt;p&gt;
And the system definition is nothing complicated, either:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(asdf:defsystem &lt;span style="color: #cd0000;"&gt;:brians-brain-1&lt;/span&gt;
&lt;span class="linenr"&gt;2:  &lt;/span&gt;    &lt;span style="color: #cd0000;"&gt;:version&lt;/span&gt; &lt;span style="color: #008b00;"&gt;"1.0"&lt;/span&gt;
&lt;span class="linenr"&gt;3:  &lt;/span&gt;    &lt;span style="color: #cd0000;"&gt;:author&lt;/span&gt; &lt;span style="color: #008b00;"&gt;"Jānis Džeriņš"&lt;/span&gt;
&lt;span class="linenr"&gt;4:  &lt;/span&gt;    &lt;span style="color: #cd0000;"&gt;:license&lt;/span&gt; &lt;span style="color: #008b00;"&gt;"Send me money if you find this stuff useful."&lt;/span&gt;
&lt;span class="linenr"&gt;5:  &lt;/span&gt;    &lt;span style="color: #cd0000;"&gt;:depends-on&lt;/span&gt; (cl-opengl cl-glut)
&lt;span class="linenr"&gt;6:  &lt;/span&gt;    &lt;span style="color: #cd0000;"&gt;:components&lt;/span&gt; ((&lt;span style="color: #cd0000;"&gt;:file&lt;/span&gt; &lt;span style="color: #008b00;"&gt;"package"&lt;/span&gt;)
&lt;span class="linenr"&gt;7:  &lt;/span&gt;                 (&lt;span style="color: #cd0000;"&gt;:file&lt;/span&gt; &lt;span style="color: #008b00;"&gt;"simple"&lt;/span&gt; &lt;span style="color: #cd0000;"&gt;:depends-on&lt;/span&gt; (&lt;span style="color: #008b00;"&gt;"package"&lt;/span&gt;))
&lt;span class="linenr"&gt;8:  &lt;/span&gt;                 (&lt;span style="color: #cd0000;"&gt;:file&lt;/span&gt; &lt;span style="color: #008b00;"&gt;"display"&lt;/span&gt; &lt;span style="color: #cd0000;"&gt;:depends-on&lt;/span&gt; (&lt;span style="color: #008b00;"&gt;"package"&lt;/span&gt; &lt;span style="color: #008b00;"&gt;"simple"&lt;/span&gt;))))
&lt;/pre&gt;




&lt;p&gt;
At this point we can get to a running animated Brian's Brain from the shell prompt:
&lt;/p&gt;



&lt;pre class="example"&gt;$ sbcl --noinform --disable-debugger \
       --eval "(asdf:operate 'asdf:load-op :brians-brain-1)" \
       --eval "(brians-brain-1:run 160 100 320 200)" \
       --eval "(quit)"
&lt;/pre&gt;




&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-5" class="outline-2"&gt;
&lt;h2 id="sec-5"&gt;Looking forward &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;


&lt;p&gt;
Next time I'm going to play with different brain representations.
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-6" class="outline-2"&gt;
&lt;h2 id="sec-6"&gt;Data files &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-6"&gt;


&lt;p&gt;
Code:
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/2-bb-cl/brians-brain-1.asd"&gt;brians-brain-1.asd&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/2-bb-cl/package.lisp"&gt;package.lisp&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/2-bb-cl/simple.lisp"&gt;simple.lisp&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/2-bb-cl/display.lisp"&gt;display.lisp&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
Benchmarking transcripts:
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-sbcl-mbp.txt"&gt;bm-sbcl-mbp.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-ccl-mbp.txt"&gt;bm-ccl-mbp.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-ccl64-mbp.txt"&gt;bm-ccl64-mbp.txt&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-7" class="outline-2"&gt;
&lt;h2 id="sec-7"&gt;Updates &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-7"&gt;


&lt;p&gt;
2009-10-28: Noticed a bug in &lt;code&gt;simulate&lt;/code&gt; function which runs the simulation for one step less than
asked.  The corrected version looks like this:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;simulate&lt;/span&gt; (steps initial)
&lt;span class="linenr"&gt;2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; with brain = initial
&lt;span class="linenr"&gt;3:  &lt;/span&gt;     repeat steps
&lt;span class="linenr"&gt;4:  &lt;/span&gt;     do (setf brain (funcall 'evolve brain))
&lt;span class="linenr"&gt;5:  &lt;/span&gt;     finally (&lt;span style="color: #4682b4; font-weight: bold;"&gt;return&lt;/span&gt; brain)))
&lt;/pre&gt;




&lt;p&gt;
Also now invoking GC before each simulation so that garbage from previous simulation has less chance
to influence the next:
&lt;/p&gt;



&lt;pre class="src src-lisp"&gt;&lt;span class="linenr"&gt; 6:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defun&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;benchmark&lt;/span&gt; ()
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;  (format *trace-output* &lt;span style="color: #008b00;"&gt;"Benchmarking on ~A ~A~%"&lt;/span&gt;
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;          (lisp-implementation-type)
&lt;span class="linenr"&gt; 9:  &lt;/span&gt;          (lisp-implementation-version))
&lt;span class="linenr"&gt;10:  &lt;/span&gt;  &lt;span style="color: #ffa500;"&gt;;; &lt;/span&gt;&lt;span style="color: #ffa500;"&gt;Warmup.
&lt;span class="linenr"&gt;11:  &lt;/span&gt;&lt;/span&gt;  (simulate 10000 (make-initialised-brain 16 16))
&lt;span class="linenr"&gt;12:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt;
&lt;span class="linenr"&gt;13:  &lt;/span&gt;     for (w h i) in '((32    32  32768)
&lt;span class="linenr"&gt;14:  &lt;/span&gt;                      (64    64  8192)
&lt;span class="linenr"&gt;15:  &lt;/span&gt;                      (128  128  2048)
&lt;span class="linenr"&gt;16:  &lt;/span&gt;                      (256  256  512)
&lt;span class="linenr"&gt;17:  &lt;/span&gt;                      (512  512  128)
&lt;span class="linenr"&gt;18:  &lt;/span&gt;                      (1024 1024 32)
&lt;span class="linenr"&gt;19:  &lt;/span&gt;                      (2048 2048 8)
&lt;span class="linenr"&gt;20:  &lt;/span&gt;                      (4096 4096 2))
&lt;span class="linenr"&gt;21:  &lt;/span&gt;     do #+ccl (gc)
&lt;span class="linenr"&gt;22:  &lt;/span&gt;        #+sbcl (gc &lt;span style="color: #cd0000;"&gt;:full&lt;/span&gt; t)
&lt;span class="linenr"&gt;23:  &lt;/span&gt;        (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; ((initial (make-initialised-brain w h)))
&lt;span class="linenr"&gt;24:  &lt;/span&gt;         (format *trace-output* &lt;span style="color: #008b00;"&gt;"*** ~Dx~D ~D iteration~:P ***~%"&lt;/span&gt; w h i)
&lt;span class="linenr"&gt;25:  &lt;/span&gt;         (time (simulate i initial))
&lt;span class="linenr"&gt;26:  &lt;/span&gt;         (finish-output *trace-output*)))
&lt;span class="linenr"&gt;27:  &lt;/span&gt;  (values))
&lt;/pre&gt;




&lt;p&gt;
The graphs are generally very similar, except the rightmost columns don't look fishy:
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-sbcl-2.png"  alt="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-sbcl-2.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-ccl32-2.png"  alt="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-ccl32-2.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-ccl64-2.png"  alt="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-ccl64-2.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
New transcripts:
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-sbcl-2.txt"&gt;bm-sbcl-2.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-ccl-2.txt"&gt;bm-ccl32-2.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://www.ltn.lv/~jonis/blog/2-bb-cl/bm-ccl64-2.txt"&gt;bm-ccl64-2.txt&lt;/a&gt;
&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;div id="footnotes"&gt;
&lt;h2 class="footnotes"&gt;Footnotes &lt;/h2&gt;
&lt;div id="text-footnotes"&gt;
&lt;p class="footnote"&gt;&lt;sup&gt;&lt;a class="footnum" name="fn.1" href="#fnr.1"&gt;1&lt;/a&gt;&lt;/sup&gt; See the &lt;a href="http://t-b-o-g.blogspot.com/2009/10/brians-brain-on-clojure.html"&gt;previous&lt;/a&gt; blog entry for the &lt;a href="http://t-b-o-g.blogspot.com/2009/10/brians-brain-on-clojure.html#fn.2"&gt;specs&lt;/a&gt;.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6424155829455583334-2881684569615265161?l=t-b-o-g.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://t-b-o-g.blogspot.com/feeds/2881684569615265161/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://t-b-o-g.blogspot.com/2009/10/brians-brain-on-common-lisp.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6424155829455583334/posts/default/2881684569615265161'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6424155829455583334/posts/default/2881684569615265161'/><link rel='alternate' type='text/html' href='http://t-b-o-g.blogspot.com/2009/10/brians-brain-on-common-lisp.html' title='Brian&apos;s Brain on Common Lisp'/><author><name>Jānis Džeriņš</name><uri>http://www.blogger.com/profile/06846740575608764708</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6424155829455583334.post-6799044333746492652</id><published>2009-10-26T13:05:00.009+02:00</published><updated>2009-10-28T13:25:27.956+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Clojure'/><title type='text'>Brian's Brain on Clojure</title><content type='html'>&lt;div id="content"&gt;

&lt;div id="outline-container-1" class="outline-2"&gt;
&lt;h2 id="sec-1"&gt;Here's your brain, and here's your brain on Clojure &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;


&lt;p&gt;
A couple weeks ago I came across a blog entry by &lt;a href="http://blog.bestinclass.dk/"&gt;Lau B. Jensen&lt;/a&gt; about &lt;a href="http://en.wikipedia.org/wiki/Brian%27s_Brain"&gt;Brian's Brain&lt;/a&gt; &lt;a href="http://blog.bestinclass.dk/index.php/2009/10/brians-functional-brain/"&gt;implementation&lt;/a&gt; in
&lt;a href="http://clojure.org/"&gt;Clojure&lt;/a&gt;.  And so it happened that briefly before this I have decided to devote some of my time to
programming in &lt;a href="http://en.wikipedia.org/wiki/ANSI_Common_Lisp"&gt;Common Lisp&lt;/a&gt;.  What a lucky coincidence – this looks like a perfect job to swap Common
Lisp back into the operational area of &lt;i&gt;my&lt;/i&gt; brain.
&lt;/p&gt;
&lt;p&gt;
You see, implementing &lt;a href="http://en.wikipedia.org/wiki/Brian%27s_Brain"&gt;Brian's Brain&lt;/a&gt; is easy.  So easy that it is not worthy of a blog post except if
one spices it up with some orthogonal material.  The reference blog entry mentioned above does just
that: implement something simple to tell everybody how cool some features of Clojure are.  (And &lt;a href="http://blog.bestinclass.dk/index.php/2009/10/brians-transient-brain/"&gt;then tell&lt;/a&gt; that those features come at a drastic performance price.)  I smell a recipe for a successful
first blog entry: controversy!  And the best harmless controversy one can get on the internet is
apples to oranges comparisons.
&lt;/p&gt;
&lt;p&gt;
So, as you might expect I'm going to compare the Clojure and Common Lisp implementations&lt;sup&gt;&lt;a class="footref" name="fnr.1" href="#fn.1"&gt;1&lt;/a&gt;&lt;/sup&gt; of
&lt;a href="http://en.wikipedia.org/wiki/Brian%27s_Brain"&gt;Brian's Brain&lt;/a&gt;.  And today I'll give you some numbers about the Clojure implementations (both
&lt;a href="http://blog.bestinclass.dk/index.php/2009/10/brians-functional-brain/"&gt;original&lt;/a&gt; and &lt;a href="http://blog.bestinclass.dk/index.php/2009/10/brians-transient-brain/"&gt;transient&lt;/a&gt;).  And to make apples and oranges more similar I'm going to leave only the
cell simulation code, leaving graphics and rendering out of the discussion.  For a while.
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-2" class="outline-2"&gt;
&lt;h2 id="sec-2"&gt;Let's measure something &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;


&lt;p&gt;
The simulation and measuring is going to be simple: I'll measure the time it takes to simulate
&lt;i&gt;brains&lt;/i&gt; of various sizes for a set number of iterations.  Since it is apparent that the time to
advance the state of &lt;i&gt;brain&lt;/i&gt; one step is proportional to the size of the &lt;i&gt;brain&lt;/i&gt;, it is reasonable
to expect the time spent to increase proportionally to the number of cells in the &lt;i&gt;brain&lt;/i&gt;.  And
since all of the &lt;i&gt;brain's&lt;/i&gt; cells are the same, there is nothing interesting to expect in the
resulting numbers.
&lt;/p&gt;
&lt;p&gt;
The only interesting thing I'd expect from varying &lt;i&gt;brain&lt;/i&gt; sizes is the impact of runtime's memory
management and maybe cache locality effects.  Looking at these things is also interesting because
the &lt;i&gt;brain&lt;/i&gt; representation in both Lau's implementations are different: [lazy] lists in the original
version and a vector in the transient version.  Therefore I'll run the code for brains of different
sizes, but adjusting the number of simulation steps so that the number of cells processed stays the
same.  This way I'd expect graphs be more-or-less straight horizontal.
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-3" class="outline-2"&gt;
&lt;h2 id="sec-3"&gt;But first &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;


&lt;p&gt;
The code to run simulation, nothing complicated:
&lt;/p&gt;



&lt;pre class="src src-clojure"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defn&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;simulate&lt;/span&gt; [steps initial f]
&lt;span class="linenr"&gt;2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; [n steps, brain initial]
&lt;span class="linenr"&gt;3:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (&amp;lt; 0 n)
&lt;span class="linenr"&gt;4:  &lt;/span&gt;      (&lt;span style="color: #4682b4; font-weight: bold;"&gt;recur&lt;/span&gt; (dec n) (step brain f))
&lt;span class="linenr"&gt;5:  &lt;/span&gt;      brain)))
&lt;/pre&gt;




&lt;p&gt;
This is a typical Clojure loop that has two "state" variables, which are updated on each iteration
(by means of &lt;code&gt;recur&lt;/code&gt; invocation): n is decreased, and brain is "transformed" using the &lt;code&gt;step&lt;/code&gt;
function.  That is, on each iteration of the loop new values for these two variables are used; no
variables (or values) are really modified.
&lt;/p&gt;
&lt;p&gt;
The value of &lt;code&gt;f&lt;/code&gt; is either &lt;code&gt;map&lt;/code&gt; or &lt;code&gt;pmap&lt;/code&gt;.  This way it is easy to run a simulation in serial or
parallel fasion by just passing a different function to the simulation.  The original &lt;code&gt;step&lt;/code&gt;
function accepts only a single parameter:
&lt;/p&gt;



&lt;pre class="src src-clojure"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defn&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;step&lt;/span&gt; [board]
&lt;span class="linenr"&gt;2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;doall&lt;/span&gt;
&lt;span class="linenr"&gt;3:  &lt;/span&gt;   (&lt;span style="color: #cd0000;"&gt;map&lt;/span&gt; (&lt;span style="color: #4682b4; font-weight: bold;"&gt;fn&lt;/span&gt; [window]
&lt;span class="linenr"&gt;4:  &lt;/span&gt;          (&lt;span style="color: #cd0000;"&gt;apply&lt;/span&gt; #(&lt;span style="color: #4682b4; font-weight: bold;"&gt;doall&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;apply&lt;/span&gt; map rules %&amp;amp;))
&lt;span class="linenr"&gt;5:  &lt;/span&gt;                 (&lt;span style="color: #4682b4; font-weight: bold;"&gt;doall&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;map&lt;/span&gt; torus-window window))))
&lt;span class="linenr"&gt;6:  &lt;/span&gt;        (torus-window board))))
&lt;/pre&gt;




&lt;p&gt;
In his article Lau suggests that to get the parallel version of this function only one function call
must be changed.  Except that after the change the function must be recompiled.  Contrast this to my
changed version:
&lt;/p&gt;



&lt;pre class="src src-clojure"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defn&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;step&lt;/span&gt; [board f]
&lt;span class="linenr"&gt;2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;doall&lt;/span&gt;
&lt;span class="linenr"&gt;3:  &lt;/span&gt;   (f (&lt;span style="color: #4682b4; font-weight: bold;"&gt;fn&lt;/span&gt; [window]
&lt;span class="linenr"&gt;4:  &lt;/span&gt;        (&lt;span style="color: #cd0000;"&gt;apply&lt;/span&gt; #(&lt;span style="color: #4682b4; font-weight: bold;"&gt;doall&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;apply&lt;/span&gt; map rules %&amp;amp;))
&lt;span class="linenr"&gt;5:  &lt;/span&gt;               (&lt;span style="color: #4682b4; font-weight: bold;"&gt;doall&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;map&lt;/span&gt; torus-window window))))
&lt;span class="linenr"&gt;6:  &lt;/span&gt;      (torus-window board))))
&lt;/pre&gt;




&lt;p&gt;
Now nothing must be recompiled when we decide to change this function from serial to parallel
version.
&lt;/p&gt;
&lt;p&gt;
Another thing I had to change is how the initial &lt;i&gt;brain&lt;/i&gt; is created.  Lau uses global variables
quite a lot (and we'll see how it hurts on many occassions ahead).  Besides not being a nice thing
in general, it also is very un-lispy (and un-functional as well, but I'll have an article on this
some other time).  There is nothing in Lau's code that requires global variables, so I re-wrote his
code to get rid of them.
&lt;/p&gt;
&lt;p&gt;
This code:
&lt;/p&gt;



&lt;pre class="src src-clojure"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;def&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;dim-board&lt;/span&gt; [90 90])
&lt;span class="linenr"&gt;2:  &lt;/span&gt;
&lt;span class="linenr"&gt;3:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;def&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;board&lt;/span&gt;
&lt;span class="linenr"&gt;4:  &lt;/span&gt;     (&lt;span style="color: #4682b4; font-weight: bold;"&gt;for&lt;/span&gt; [x (&lt;span style="color: #cd0000;"&gt;range&lt;/span&gt; (dim-board 0))]
&lt;span class="linenr"&gt;5:  &lt;/span&gt;       (&lt;span style="color: #4682b4; font-weight: bold;"&gt;for&lt;/span&gt; [y (&lt;span style="color: #cd0000;"&gt;range&lt;/span&gt; (dim-board 1))]
&lt;span class="linenr"&gt;6:  &lt;/span&gt;         [(&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (&amp;lt; 50 (rand-int 100)) &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt; &lt;span style="color: #cd0000;"&gt;:off&lt;/span&gt;) x y])))
&lt;/pre&gt;




&lt;p&gt;
now has become this:
&lt;/p&gt;



&lt;pre class="src src-clojure"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defn&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;make-random-board&lt;/span&gt; [w h]
&lt;span class="linenr"&gt;2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;for&lt;/span&gt; [x (&lt;span style="color: #cd0000;"&gt;range&lt;/span&gt; w)]
&lt;span class="linenr"&gt;3:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;for&lt;/span&gt; [y (&lt;span style="color: #cd0000;"&gt;range&lt;/span&gt; h)]
&lt;span class="linenr"&gt;4:  &lt;/span&gt;      [(&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (&amp;lt; 50 (rand-int 100)) &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt; &lt;span style="color: #cd0000;"&gt;:off&lt;/span&gt;) x y])))
&lt;/pre&gt;




&lt;p&gt;
Now we're ready to experiment with various combinations of &lt;i&gt;brain&lt;/i&gt; size and iteration counts in the
&lt;a href="http://en.wikipedia.org/wiki/Read-eval-print_loop"&gt;REPL&lt;/a&gt; (manually-reformatted by me for aesthetic reasons):
&lt;/p&gt;



&lt;pre class="example"&gt;user&amp;gt; (simulate 0 (make-random-board 4 3) map)
(([:on  0 0] [:on  0 1] [:on  0 2])
 ([:on  1 0] [:on  1 1] [:off 1 2])
 ([:on  2 0] [:off 2 1] [:off 2 2])
 ([:off 3 0] [:off 3 1] [:on  3 2]))
user&amp;gt; (simulate 1 (make-random-board 4 3) pmap)
(([:off   0 0] [:dying 0 1] [:off   0 2])
 ([:off   1 0] [:dying 1 1] [:dying 1 2])
 ([:dying 2 0] [:off   2 1] [:off   2 2])
 ([:dying 3 0] [:dying 3 1] [:off   3 2]))
user&amp;gt; (simulate 2 (make-random-board 4 3) map)
(([:off 0 0] [:off 0 1] [:off 0 2])
 ([:off 1 0] [:off 1 1] [:off 1 2])
 ([:off 2 0] [:off 2 1] [:off 2 2])
 ([:off 3 0] [:off 3 1] [:off 3 2]))
&lt;/pre&gt;




&lt;p&gt;
One interesting observation: the board after 2 iterations looks broken (all cells are dead).  Is the
code somehow broken?  Things like these are easy to find out interactively.  But only if it's easy
to specify various parameters.  And that's where the global variables make life hard.  In Lau's
implementation, for each of the little &lt;i&gt;REPL&lt;/i&gt; tests I'd have to go and redefine two global variables
(in order).  In my version one can play to their heart content without leaving the &lt;i&gt;REPL&lt;/i&gt;.
&lt;/p&gt;
&lt;p&gt;
So, what exactly is going on here?  Nothing special: having half the &lt;i&gt;brain's&lt;/i&gt; cells alive at the
start makes this little &lt;i&gt;brain&lt;/i&gt; overpopulated and by the rules of the world all cells die.  And
since I'm not going to manually perform pretty-printing or write a function to do it for me, I'll
show another advantage of avoiding unnecessary global variables.
&lt;/p&gt;
&lt;p&gt;
You see, for doing reliable experiments (like this one I'm doing here) they must be repeatable.  So
using randomly-initialised &lt;i&gt;brains&lt;/i&gt; is not going to work (unless a specific
pseudo-random-number-generator is seeded with some known seed, but that's besides the point).  And
it so happens that the other Lau's implementation uses a specific initial &lt;i&gt;brain&lt;/i&gt; setup: two
adjacent cells in the first row are initially alive.  The code to create such a &lt;i&gt;brain&lt;/i&gt; is:
&lt;/p&gt;



&lt;pre class="src src-clojure"&gt;&lt;span class="linenr"&gt;1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defn&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;make-special-board&lt;/span&gt; [w h]
&lt;span class="linenr"&gt;2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; [mid (/ w 2)]
&lt;span class="linenr"&gt;3:  &lt;/span&gt;   (&lt;span style="color: #4682b4; font-weight: bold;"&gt;for&lt;/span&gt; [x (&lt;span style="color: #cd0000;"&gt;range&lt;/span&gt; w)]
&lt;span class="linenr"&gt;4:  &lt;/span&gt;     (&lt;span style="color: #4682b4; font-weight: bold;"&gt;for&lt;/span&gt; [y (&lt;span style="color: #cd0000;"&gt;range&lt;/span&gt; h)]
&lt;span class="linenr"&gt;5:  &lt;/span&gt;       [(&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (&lt;span style="color: #4682b4; font-weight: bold;"&gt;and&lt;/span&gt; (= y 0) (&amp;lt;= mid x (&lt;span style="color: #cd0000;"&gt;inc&lt;/span&gt; mid))) &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt; &lt;span style="color: #cd0000;"&gt;:off&lt;/span&gt;) x y]))))
&lt;/pre&gt;




&lt;p&gt;
Now, let's play a bit in the &lt;i&gt;REPL&lt;/i&gt;:
&lt;/p&gt;



&lt;pre class="example"&gt;user&amp;gt; (simulate 0 (make-special-board 4 3) map)
(([:off 0 0] [:off 0 1] [:off 0 2])
 ([:off 1 0] [:off 1 1] [:off 1 2])
 ([:on  2 0] [:off 2 1] [:off 2 2])
 ([:on  3 0] [:off 3 1] [:off 3 2]))
user&amp;gt; (simulate 1 (make-special-board 4 3) map)
(([:off   0 0] [:off 0 1] [:off 0 2])
 ([:off   1 0] [:off 1 1] [:off 1 2])
 ([:dying 2 0] [:on  2 1] [:on  2 2])
 ([:dying 3 0] [:on  3 1] [:on  3 2]))
user&amp;gt; (simulate 2 (make-special-board 4 3) map)
(([:on  0 0] [:on    0 1] [:on    0 2])
 ([:on  1 0] [:on    1 1] [:on    1 2])
 ([:off 2 0] [:dying 2 1] [:dying 2 2])
 ([:off 3 0] [:dying 3 1] [:dying 3 2]))
user&amp;gt; (= *1 (simulate 2 (make-special-board 4 3) pmap))
true
user&amp;gt; (= (simulate 10 (make-special-board 32 32) map)
         (simulate 10 (make-special-board 32 32) pmap))
true
&lt;/pre&gt;




&lt;p&gt;
Great, not only can we experiment with different &lt;i&gt;brain&lt;/i&gt; dimensions but also with different
contents.  And we can easily check if parallel and sequential versions work the same.  Now try doing
this in the Lau's original version!
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-4" class="outline-2"&gt;
&lt;h2 id="sec-4"&gt;Almost there &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;


&lt;p&gt;
Now we are almost ready to measure things.  Here's the function I'll use:
&lt;/p&gt;



&lt;pre class="src src-clojure"&gt;&lt;span class="linenr"&gt; 1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defn&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;benchmark&lt;/span&gt; []
&lt;span class="linenr"&gt; 2:  &lt;/span&gt;  &lt;span style="color: #ffa500;"&gt;;; &lt;/span&gt;&lt;span style="color: #ffa500;"&gt;Warmup
&lt;span class="linenr"&gt; 3:  &lt;/span&gt;&lt;/span&gt;  (simulate 7500 (make-special-board 16 16) map)
&lt;span class="linenr"&gt; 4:  &lt;/span&gt;  (simulate 7500 (make-special-board 16 16) pmap)
&lt;span class="linenr"&gt; 5:  &lt;/span&gt;  (println (clojure-version))
&lt;span class="linenr"&gt; 6:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;doseq&lt;/span&gt; [[w h i] [[32    32  32768]
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;                   [64    64  8192]
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;                   [128  128  2048]
&lt;span class="linenr"&gt; 9:  &lt;/span&gt;                   [256  256  512]
&lt;span class="linenr"&gt;10:  &lt;/span&gt;                   [512  512  128]]]
&lt;span class="linenr"&gt;11:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; [initial (&lt;span style="color: #4682b4; font-weight: bold;"&gt;doall&lt;/span&gt; (make-special-board w h))]
&lt;span class="linenr"&gt;12:  &lt;/span&gt;      (print (&lt;span style="color: #cd0000;"&gt;str&lt;/span&gt; &lt;span style="color: #008b00;"&gt;"S "&lt;/span&gt; w &lt;span style="color: #008b00;"&gt;"x"&lt;/span&gt; h &lt;span style="color: #008b00;"&gt;", "&lt;/span&gt; i &lt;span style="color: #008b00;"&gt;" iteration(s): "&lt;/span&gt;))
&lt;span class="linenr"&gt;13:  &lt;/span&gt;      (&lt;span style="color: #cd0000;"&gt;time&lt;/span&gt; (simulate i initial map))
&lt;span class="linenr"&gt;14:  &lt;/span&gt;      (flush)
&lt;span class="linenr"&gt;15:  &lt;/span&gt;      (print (&lt;span style="color: #cd0000;"&gt;str&lt;/span&gt; &lt;span style="color: #008b00;"&gt;"P "&lt;/span&gt; w &lt;span style="color: #008b00;"&gt;"x"&lt;/span&gt; h &lt;span style="color: #008b00;"&gt;", "&lt;/span&gt; i &lt;span style="color: #008b00;"&gt;" iteration(s): "&lt;/span&gt;))
&lt;span class="linenr"&gt;16:  &lt;/span&gt;      (&lt;span style="color: #cd0000;"&gt;time&lt;/span&gt; (simulate i initial pmap))
&lt;span class="linenr"&gt;17:  &lt;/span&gt;      (flush))))
&lt;/pre&gt;




&lt;p&gt;
Nothing special, repeat serial and parallel versions for given brain dimensions and iteration
number.  The results of running this on my MacBook Pro&lt;sup&gt;&lt;a class="footref" name="fnr.2" href="#fn.2"&gt;2&lt;/a&gt;&lt;/sup&gt; (with the warmup timings removed):
&lt;/p&gt;
&lt;p&gt;
Running this benchmark from shell can be done like this:
&lt;/p&gt;



&lt;pre class="example"&gt;$ java -server -jar clojure.jar -i ca.clj -e "(benchmark)"
&lt;/pre&gt;




&lt;p&gt;
After a while of intensive contributing to the global warming we get some output:
&lt;/p&gt;



&lt;pre class="example"&gt;$ java -server -jar clojure.jar -i ca.clj -e "(benchmark)"
1.1.0-alpha-SNAPSHOT
S 32x32, 32768 iteration(s): "Elapsed time: 415506.172 msecs"
P 32x32, 32768 iteration(s): "Elapsed time: 289353.533 msecs"
S 64x64, 8192 iteration(s): "Elapsed time: 425244.632 msecs"
P 64x64, 8192 iteration(s): "Elapsed time: 290253.746 msecs"
S 128x128, 2048 iteration(s): "Elapsed time: 421812.085 msecs"
P 128x128, 2048 iteration(s): "Elapsed time: 287244.863 msecs"
S 256x256, 512 iteration(s): "Elapsed time: 422331.897 msecs"
P 256x256, 512 iteration(s): "Elapsed time: 289623.448 msecs"
S 512x512, 128 iteration(s): "Elapsed time: 440064.214 msecs"
P 512x512, 128 iteration(s): "Elapsed time: 286282.461 msecs"
&lt;/pre&gt;




&lt;p&gt;
One thing you might have noticed is that the number of iterations increases as the &lt;i&gt;brain&lt;/i&gt; size
decreases.  This is to have the number of cells processed in each simulation constant and the
resulting numbers easy to compare.
&lt;/p&gt;
&lt;p&gt;
I also run all the "benchmarks" on two JVMs (server VMs at that): OSX default 1.5 (32-bit) and 1.6
(64-bit, there's only server VM).  The switching is done using the Java Preferences utility:
&lt;/p&gt;



&lt;pre class="example"&gt;$ java -version
java version "1.5.0_20"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02-315)
Java HotSpot(TM) Client VM (build 1.5.0_20-141, mixed mode, sharing)

$ java -server -version
java version "1.5.0_20"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02-315)
Java HotSpot(TM) Server VM (build 1.5.0_20-141, mixed mode)

$ java -version
java version "1.6.0_15"
Java(TM) SE Runtime Environment (build 1.6.0_15-b03-226)
Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-92, mixed mode)
&lt;/pre&gt;




&lt;p&gt;
Here are the numbers for the original Brian's Brain implementation:
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-1.png"  alt="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-1.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
Looks like the parallel version is about 45% (JVM1.5) to 60% (JVM1.6) faster than the serial.
&lt;/p&gt;
&lt;p&gt;
The times for 512x512 are not included for JVM1.6 because the serial version was running for
2416341.265ms (that's about 40 minutes!), which is way out of scale for this graph. I guess it is
because of tight memory situation because the next (parallel) benchmark bailed out with out of
memory exception.
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-5" class="outline-2"&gt;
&lt;h2 id="sec-5"&gt;Going transient &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;


&lt;p&gt;
Lau's code for transient version is much worse.  More global variables.  And functions depending on
them.  What a mess&amp;hellip; Just look at it:
&lt;/p&gt;



&lt;pre class="src src-clojure"&gt;&lt;span class="linenr"&gt; 1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;def&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;dim-board&lt;/span&gt;   [130  130])
&lt;span class="linenr"&gt; 2:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;def&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;dim-screen&lt;/span&gt;  [600  600])
&lt;span class="linenr"&gt; 3:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;def&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;dim-scale&lt;/span&gt;   (&lt;span style="color: #cd0000;"&gt;vec&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;map&lt;/span&gt; / dim-screen dim-board)))
&lt;span class="linenr"&gt; 4:  &lt;/span&gt;
&lt;span class="linenr"&gt; 5:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;def&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;board-size&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;apply&lt;/span&gt; * dim-board))
&lt;span class="linenr"&gt; 6:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;def&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;board&lt;/span&gt;      (&lt;span style="color: #4682b4; font-weight: bold;"&gt;-&amp;gt;&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;vec&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;repeat&lt;/span&gt; board-size &lt;span style="color: #cd0000;"&gt;:off&lt;/span&gt;))
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;                    (&lt;span style="color: #cd0000;"&gt;assoc&lt;/span&gt; (/ (dim-board 0) 2)       &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt;)
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;                    (&lt;span style="color: #cd0000;"&gt;assoc&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;inc&lt;/span&gt; (/ (dim-board 0) 2)) &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt;)))
&lt;span class="linenr"&gt; 9:  &lt;/span&gt;
&lt;span class="linenr"&gt;10:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;def&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;cords&lt;/span&gt; (&lt;span style="color: #4682b4; font-weight: bold;"&gt;for&lt;/span&gt; [y (&lt;span style="color: #cd0000;"&gt;range&lt;/span&gt; (dim-board 0)) x (&lt;span style="color: #cd0000;"&gt;range&lt;/span&gt; (dim-board 1))] [x y]))
&lt;span class="linenr"&gt;11:  &lt;/span&gt;
&lt;span class="linenr"&gt;12:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defn&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;torus-coordinate&lt;/span&gt;
&lt;span class="linenr"&gt;13:  &lt;/span&gt;  [idx]
&lt;span class="linenr"&gt;14:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;cond&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;neg?&lt;/span&gt; idx)          (+ idx board-size)
&lt;span class="linenr"&gt;15:  &lt;/span&gt;        (&amp;gt;= idx board-size) (- idx board-size)
&lt;span class="linenr"&gt;16:  &lt;/span&gt;    &lt;span style="color: #cd0000;"&gt;:else&lt;/span&gt; idx))
&lt;span class="linenr"&gt;17:  &lt;/span&gt;
&lt;span class="linenr"&gt;18:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;def&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;above&lt;/span&gt;     (- (dim-board 0)))
&lt;span class="linenr"&gt;19:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;def&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;below&lt;/span&gt;     (dim-board 0))
&lt;span class="linenr"&gt;20:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;def&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;neighbors&lt;/span&gt; [-1 1 (dec above) above (&lt;span style="color: #cd0000;"&gt;inc&lt;/span&gt; above) (dec below) below (&lt;span style="color: #cd0000;"&gt;inc&lt;/span&gt; below)])
&lt;span class="linenr"&gt;21:  &lt;/span&gt;
&lt;span class="linenr"&gt;22:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defn&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;on-neighbors&lt;/span&gt; [i board]
&lt;span class="linenr"&gt;23:  &lt;/span&gt;  (&lt;span style="color: #cd0000;"&gt;count&lt;/span&gt;
&lt;span class="linenr"&gt;24:  &lt;/span&gt;   (&lt;span style="color: #cd0000;"&gt;filter&lt;/span&gt; #(= &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt; (nth board %))
&lt;span class="linenr"&gt;25:  &lt;/span&gt;           (&lt;span style="color: #cd0000;"&gt;map&lt;/span&gt; #(torus-coordinate (+ i %)) neighbors))))
&lt;span class="linenr"&gt;26:  &lt;/span&gt;
&lt;span class="linenr"&gt;27:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defn&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;step&lt;/span&gt; [board]
&lt;span class="linenr"&gt;28:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; [i 0 next-state (transient board)]
&lt;span class="linenr"&gt;29:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (&amp;lt; i board-size)
&lt;span class="linenr"&gt;30:  &lt;/span&gt;      (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; [self         (nth board i)]
&lt;span class="linenr"&gt;31:  &lt;/span&gt;        (&lt;span style="color: #4682b4; font-weight: bold;"&gt;recur&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;inc&lt;/span&gt; i)
&lt;span class="linenr"&gt;32:  &lt;/span&gt;               (assoc! next-state i (&lt;span style="color: #4682b4; font-weight: bold;"&gt;cond&lt;/span&gt;
&lt;span class="linenr"&gt;33:  &lt;/span&gt;                                      (= &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt;    self)                &lt;span style="color: #cd0000;"&gt;:dying&lt;/span&gt;
&lt;span class="linenr"&gt;34:  &lt;/span&gt;                                      (= &lt;span style="color: #cd0000;"&gt;:dying&lt;/span&gt; self)                &lt;span style="color: #cd0000;"&gt;:off&lt;/span&gt;
&lt;span class="linenr"&gt;35:  &lt;/span&gt;                                      (= 2 (on-neighbors i board))   &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt;
&lt;span class="linenr"&gt;36:  &lt;/span&gt;                                      &lt;span style="color: #cd0000;"&gt;:else&lt;/span&gt;                          &lt;span style="color: #cd0000;"&gt;:off&lt;/span&gt;))))
&lt;span class="linenr"&gt;37:  &lt;/span&gt;      (persistent! next-state))))
&lt;/pre&gt;




&lt;p&gt;
9 global variables in 37 lines of code!  Bad Lau!  Bad!  Also, the results of &lt;code&gt;torus-coordinate&lt;/code&gt;,
&lt;code&gt;on-neighbors&lt;/code&gt; and &lt;code&gt;step&lt;/code&gt; (in short – all of them) depend on things besides the parameters.  Very
un-functional indeed.
&lt;/p&gt;
&lt;p&gt;
Ok, let's put horrors aside and do what we can to get this mess under control.  My only goal is to
get the simulation and benchmarking functions working, so first, here they are:
&lt;/p&gt;



&lt;pre class="src src-clojure"&gt;&lt;span class="linenr"&gt; 1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defn&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;simulate&lt;/span&gt; [steps initial]
&lt;span class="linenr"&gt; 2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; [n steps, board initial]
&lt;span class="linenr"&gt; 3:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (&amp;lt; 0 n)
&lt;span class="linenr"&gt; 4:  &lt;/span&gt;      (&lt;span style="color: #4682b4; font-weight: bold;"&gt;recur&lt;/span&gt; (dec n) (step board))
&lt;span class="linenr"&gt; 5:  &lt;/span&gt;      board)))
&lt;span class="linenr"&gt; 6:  &lt;/span&gt;
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defn&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;benchmark&lt;/span&gt; []
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;  &lt;span style="color: #ffa500;"&gt;;; &lt;/span&gt;&lt;span style="color: #ffa500;"&gt;Warmup
&lt;span class="linenr"&gt; 9:  &lt;/span&gt;&lt;/span&gt;  (simulate 10000 (make-board 16 16))
&lt;span class="linenr"&gt;10:  &lt;/span&gt;  (println (clojure-version))
&lt;span class="linenr"&gt;11:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;doseq&lt;/span&gt; [[w h i] [[32    32  32768]
&lt;span class="linenr"&gt;12:  &lt;/span&gt;                   [64    64  8192]
&lt;span class="linenr"&gt;13:  &lt;/span&gt;                   [128  128  2048]
&lt;span class="linenr"&gt;14:  &lt;/span&gt;                   [256  256  512]
&lt;span class="linenr"&gt;15:  &lt;/span&gt;                   [512  512  128]
&lt;span class="linenr"&gt;16:  &lt;/span&gt;                   [1024 1024 32]]]
&lt;span class="linenr"&gt;17:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; [initial (make-board w h)]
&lt;span class="linenr"&gt;18:  &lt;/span&gt;      (print (&lt;span style="color: #cd0000;"&gt;str&lt;/span&gt; &lt;span style="color: #008b00;"&gt;"S "&lt;/span&gt; w &lt;span style="color: #008b00;"&gt;"x"&lt;/span&gt; h &lt;span style="color: #008b00;"&gt;", "&lt;/span&gt; i &lt;span style="color: #008b00;"&gt;" iteration(s): "&lt;/span&gt;))
&lt;span class="linenr"&gt;19:  &lt;/span&gt;      (&lt;span style="color: #cd0000;"&gt;time&lt;/span&gt; (simulate i initial))
&lt;span class="linenr"&gt;20:  &lt;/span&gt;      (flush))))
&lt;/pre&gt;




&lt;p&gt;
Notice that the transient version cannot be parallelized by simply changing a single function, so
there is no parallel version at all.
&lt;/p&gt;



&lt;p&gt;
Here's the changed code that allows me to run the benchmark:
&lt;/p&gt;



&lt;pre class="src src-clojure"&gt;&lt;span class="linenr"&gt; 1:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defn&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;make-board&lt;/span&gt; [w h]
&lt;span class="linenr"&gt; 2:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; [above (- w), below w]
&lt;span class="linenr"&gt; 3:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;def&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;neighbors&lt;/span&gt; [-1 1 (dec above) above (&lt;span style="color: #cd0000;"&gt;inc&lt;/span&gt; above) (dec below) below (&lt;span style="color: #cd0000;"&gt;inc&lt;/span&gt; below)]))
&lt;span class="linenr"&gt; 4:  &lt;/span&gt;
&lt;span class="linenr"&gt; 5:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;-&amp;gt;&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;vec&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;repeat&lt;/span&gt; (* w h) &lt;span style="color: #cd0000;"&gt;:off&lt;/span&gt;))
&lt;span class="linenr"&gt; 6:  &lt;/span&gt;      (&lt;span style="color: #cd0000;"&gt;assoc&lt;/span&gt; (/ w 2)       &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt;)
&lt;span class="linenr"&gt; 7:  &lt;/span&gt;      (&lt;span style="color: #cd0000;"&gt;assoc&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;inc&lt;/span&gt; (/ w 2)) &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt;)))
&lt;span class="linenr"&gt; 8:  &lt;/span&gt;
&lt;span class="linenr"&gt; 9:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defn&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;torus-coordinate&lt;/span&gt; [idx board]
&lt;span class="linenr"&gt;10:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; [board-size (&lt;span style="color: #cd0000;"&gt;count&lt;/span&gt; board)]
&lt;span class="linenr"&gt;11:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;cond&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;neg?&lt;/span&gt; idx)          (+ idx board-size)
&lt;span class="linenr"&gt;12:  &lt;/span&gt;          (&amp;gt;= idx board-size) (- idx board-size)
&lt;span class="linenr"&gt;13:  &lt;/span&gt;          &lt;span style="color: #cd0000;"&gt;:else&lt;/span&gt; idx)))
&lt;span class="linenr"&gt;14:  &lt;/span&gt;
&lt;span class="linenr"&gt;15:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defn&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;on-neighbors&lt;/span&gt; [i board]
&lt;span class="linenr"&gt;16:  &lt;/span&gt;  (&lt;span style="color: #cd0000;"&gt;count&lt;/span&gt;
&lt;span class="linenr"&gt;17:  &lt;/span&gt;   (&lt;span style="color: #cd0000;"&gt;filter&lt;/span&gt; #(= &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt; (nth board %))
&lt;span class="linenr"&gt;18:  &lt;/span&gt;           (&lt;span style="color: #cd0000;"&gt;map&lt;/span&gt; #(torus-coordinate (+ i %) board) neighbors))))
&lt;span class="linenr"&gt;19:  &lt;/span&gt;
&lt;span class="linenr"&gt;20:  &lt;/span&gt;(&lt;span style="color: #4682b4; font-weight: bold;"&gt;defn&lt;/span&gt; &lt;span style="color: #ff0000; font-weight: bold;"&gt;step&lt;/span&gt; [board]
&lt;span class="linenr"&gt;21:  &lt;/span&gt;  (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; [board-size (&lt;span style="color: #cd0000;"&gt;count&lt;/span&gt; board)]
&lt;span class="linenr"&gt;22:  &lt;/span&gt;    (&lt;span style="color: #4682b4; font-weight: bold;"&gt;loop&lt;/span&gt; [i 0 next-state (transient board)]
&lt;span class="linenr"&gt;23:  &lt;/span&gt;      (&lt;span style="color: #4682b4; font-weight: bold;"&gt;if&lt;/span&gt; (&amp;lt; i board-size)
&lt;span class="linenr"&gt;24:  &lt;/span&gt;        (&lt;span style="color: #4682b4; font-weight: bold;"&gt;let&lt;/span&gt; [self         (nth board i)]
&lt;span class="linenr"&gt;25:  &lt;/span&gt;          (&lt;span style="color: #4682b4; font-weight: bold;"&gt;recur&lt;/span&gt; (&lt;span style="color: #cd0000;"&gt;inc&lt;/span&gt; i)
&lt;span class="linenr"&gt;26:  &lt;/span&gt;                 (assoc! next-state i (&lt;span style="color: #4682b4; font-weight: bold;"&gt;cond&lt;/span&gt;
&lt;span class="linenr"&gt;27:  &lt;/span&gt;                                       (= &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt;    self)                &lt;span style="color: #cd0000;"&gt;:dying&lt;/span&gt;
&lt;span class="linenr"&gt;28:  &lt;/span&gt;                                       (= &lt;span style="color: #cd0000;"&gt;:dying&lt;/span&gt; self)                &lt;span style="color: #cd0000;"&gt;:off&lt;/span&gt;
&lt;span class="linenr"&gt;29:  &lt;/span&gt;                                       (= 2 (on-neighbors i board))   &lt;span style="color: #cd0000;"&gt;:on&lt;/span&gt;
&lt;span class="linenr"&gt;30:  &lt;/span&gt;                                       &lt;span style="color: #cd0000;"&gt;:else&lt;/span&gt;                          &lt;span style="color: #cd0000;"&gt;:off&lt;/span&gt;))))
&lt;span class="linenr"&gt;31:  &lt;/span&gt;        (persistent! next-state)))))
&lt;/pre&gt;




&lt;p&gt;
&lt;code&gt;make-board&lt;/code&gt; is ugly, and defines global variables.  Will have to live with it. &lt;code&gt;torus-coordinate&lt;/code&gt;
and &lt;code&gt;step&lt;/code&gt; don't use global variables any more.  And we're down to one global variable which
&lt;code&gt;on-neighbors&lt;/code&gt; depends on.  I'll let this one remain on Lau's conscience.
&lt;/p&gt;
&lt;p&gt;
Another thing that I just can't let slip is that the &lt;code&gt;torus-coordinate&lt;/code&gt; function is JUST PLAIN
WRONG.  Sorry, I might have raised my voice.  The problem is that for the cell in the leftmost
column the cell to its left is on the opposite side of the board.  Similarly for the rightmost.  But
&lt;code&gt;torus-coordinate&lt;/code&gt; cheats and for the leftmost returns the cell on the rightmost column (correct)
but one row above (wrong!).  For the rightmost cells it returns the cell on the leftmost column, but
one row below.  This error would instantly disqualify the transient implementation in any
half-serious apples-to-apples comparison.
&lt;/p&gt;
&lt;p&gt;
It is easy to see that &lt;code&gt;torus-coordinate&lt;/code&gt; is called for every "off" cell in a &lt;i&gt;brain&lt;/i&gt;.  And there
are more "off" cells than "dying" or "on" or the sum of those, respectively (based on my
observations of running the simulations graphically; empirical proof is easy, but will have to
wait).  Allowing this function to do less work definitely impacts its performance.  I'm not going to
write the correct Clojure version of this function so let's pretend everything is just fine and look
at the performance:
&lt;/p&gt;



&lt;pre class="example"&gt;$ java -server -jar clojure.jar -i transient-ca.clj -e "(benchmark)"
1.1.0-alpha-SNAPSHOT
S 32x32, 32768 iteration(s): "Elapsed time: 68259.648 msecs"
S 64x64, 8192 iteration(s): "Elapsed time: 69464.992 msecs"
S 128x128, 2048 iteration(s): "Elapsed time: 67646.623 msecs"
S 256x256, 512 iteration(s): "Elapsed time: 70889.975 msecs"
S 512x512, 128 iteration(s): "Elapsed time: 72941.713 msecs"
S 1024x1024, 32 iteration(s): "Elapsed time: 73866.637 msecs"
&lt;/pre&gt;




&lt;p&gt;
Quite an improvement I'd say, indeed.  And since the benchmark finished relatively fast, I decided
to run it again:
&lt;/p&gt;



&lt;pre class="example"&gt;$ java -server -jar clojure.jar -i transient-ca.clj -e "(benchmark)"
1.1.0-alpha-SNAPSHOT
S 32x32, 32768 iteration(s): "Elapsed time: 90808.629 msecs"
S 64x64, 8192 iteration(s): "Elapsed time: 92661.493 msecs"
S 128x128, 2048 iteration(s): "Elapsed time: 89570.418 msecs"
S 256x256, 512 iteration(s): "Elapsed time: 92398.728 msecs"
S 512x512, 128 iteration(s): "Elapsed time: 94843.865 msecs"
S 1024x1024, 32 iteration(s): "Elapsed time: 97478.63 msecs"
&lt;/pre&gt;




&lt;p&gt;
Weird is the least I can say about this.  So I run it one more time&amp;hellip; And then again.  All the
times getting different timings (I was not doing anything else while running the benchmarks on the
computer, so I don't have a slightest clue about the cause of variation).  Then I gave up.  Here's a
graph of all four runs:
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-2.png"  alt="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-2.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
Three more runs on JVM1.6, which are quite a bit faster and closer to each other.  But they run out
of memory on the 1024x1024 run.
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-3.png"  alt="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-3.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
Here are the averages of all the runs together in one graph:
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-4.png"  alt="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-4.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
Around 25% speed increase in JVM1.6 if I'm reading the numbers correctly.
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-6" class="outline-2"&gt;
&lt;h2 id="sec-6"&gt;Wrapping things up &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-6"&gt;


&lt;p&gt;
Now, the complete picture of original and transient implementations:
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-5.png"  alt="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-5.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
As expected the lines are quite horizontal, but not in all cases. Especially non-horizontal are the
original implementation on the JVM1.6.  It might be related to it being 64-bit, but I'm not really
going to investigate this any further.  So, as we can see, the transient version is about 4 times
faster than the parallel original version, and about 6 times faster than serial version.
&lt;/p&gt;
&lt;p&gt;
It is apparent that the transient version is quite a bit faster than the original.  And with the
original version we can see that the parallel version really does speed things up on two cores.  How
about 8 cores?  It so happens that I have SSH access to a quite recent Mac Pro&lt;sup&gt;&lt;a class="footref" name="fnr.3" href="#fn.3"&gt;3&lt;/a&gt;&lt;/sup&gt;.  One notable
thing about this system is that it is set to energy-saving mode, so don't be surprised by the
absolute numbers, look for their relative magnitudes:
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-6.png"  alt="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-6.png" /&gt;
&lt;/p&gt;
&lt;p&gt;
Watching the java process in &lt;code&gt;top&lt;/code&gt; was showing around 600% CPU usage when running the parallel
simulations.  What surprises me is that the performance gain from more cores is not there.  But
surprise is not big enough for me to go investigate this any further.
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-7" class="outline-2"&gt;
&lt;h2 id="sec-7"&gt;Final words &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-7"&gt;


&lt;p&gt;
This is only an introduction for what I really wanted to write about.  It turned out much longer
than I thought it would be.  Cangratulations to everybody who got this far (reading the content, not
searching for pretty pictures).
&lt;/p&gt;
&lt;p&gt;
This article is in no way intended as bashing on anything.  Yes, I didsagree with Lau on some
software design points, but that should be only considered as that – disagreement.  I have shown how
I'd write some parts of his program.  He will be welcome to comment on my Common Lisp implementation
after my next post.
&lt;/p&gt;
&lt;p&gt;
Please don't use these numbers to draw any conclusions about the performance of my computer,
Clojure, Java, my or Lau's code or anything else.  The numbers are as un-scientific as can be.  I
was intentionally not doing anyhting else while running the benchmarks on the computer, but there
are all the usual desktop things running (except the screensaver which I disabled).
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id="outline-container-8" class="outline-2"&gt;
&lt;h2 id="sec-8"&gt;Data files &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-8"&gt;


&lt;p&gt;
Original: &lt;a href="http://www.ltn.lv/~jonis/blog/1-bb-clj/ca.clj"&gt;source&lt;/a&gt; &lt;a href="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-ca-mbp.txt"&gt;MacBook Pro transcript&lt;/a&gt; &lt;a href="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-ca-mp.txt"&gt;MacPro transcript&lt;/a&gt;&lt;br/&gt;
Transient: &lt;a href="http://www.ltn.lv/~jonis/blog/1-bb-clj/transient-ca.clj"&gt;source&lt;/a&gt; &lt;a href="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-transient-mbp.txt"&gt;MacBook Pro transcript&lt;/a&gt; &lt;a href="http://www.ltn.lv/~jonis/blog/1-bb-clj/bm-transient-mp.txt"&gt;MacPro transcript&lt;/a&gt;
&lt;/p&gt;



&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-9" class="outline-2"&gt;
&lt;h2 id="sec-9"&gt;Updates &lt;/h2&gt;
&lt;div class="outline-text-2" id="text-9"&gt;


&lt;p&gt;
2009-10-28: Apparently I have been misguided about the energy-savings mode on the MacPro.  Going to the box
physically and checking if the option is turned on or not, I found out (to my big surprise) that the
option is gone.  For those who have a Mac can compare their "Energy Saver" window to what it was
like before: &lt;a href="http://support.apple.com/kb/HT3092"&gt;http://support.apple.com/kb/HT3092&lt;/a&gt;.  So, I'm not sure whether the "reduced" mode is
enabled or not.
&lt;/p&gt;



&lt;/div&gt;
&lt;/div&gt;


&lt;div id="footnotes"&gt;
&lt;h2 class="footnotes"&gt;Footnotes &lt;/h2&gt;
&lt;div id="text-footnotes"&gt;
&lt;p class="footnote"&gt;&lt;sup&gt;&lt;a class="footnum" name="fn.1" href="#fnr.1"&gt;1&lt;/a&gt;&lt;/sup&gt; When I'm talking about Clojure or Common Lisp implementation, I'm talking about &lt;i&gt;brain's&lt;/i&gt;
implementation in that programming language respectively, not the implementation of language.
&lt;/p&gt;
&lt;p class="footnote"&gt;&lt;sup&gt;&lt;a class="footnum" name="fn.2" href="#fnr.2"&gt;2&lt;/a&gt;&lt;/sup&gt; 2.8GHz Intel Core 2 Duo, 6MB cache, 1.07GHz bus speed.
&lt;/p&gt;
&lt;p class="footnote"&gt;&lt;sup&gt;&lt;a class="footnum" name="fn.3" href="#fnr.3"&gt;3&lt;/a&gt;&lt;/sup&gt; 3GHz Quad-Core Intel Xeon (2 processors, 8 cores total), 8MB cache per processor. 1.33GHz bus
speed.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="postamble"&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6424155829455583334-6799044333746492652?l=t-b-o-g.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://t-b-o-g.blogspot.com/feeds/6799044333746492652/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://t-b-o-g.blogspot.com/2009/10/brians-brain-on-clojure.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6424155829455583334/posts/default/6799044333746492652'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6424155829455583334/posts/default/6799044333746492652'/><link rel='alternate' type='text/html' href='http://t-b-o-g.blogspot.com/2009/10/brians-brain-on-clojure.html' title='Brian&apos;s Brain on Clojure'/><author><name>Jānis Džeriņš</name><uri>http://www.blogger.com/profile/06846740575608764708</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
