The beauty of grey: Brian's brain, on Common Lisp, take 2

While writing code for the first version of Brian's Brain some things crept into my mind, but I left them out of the article to keep it focused. The main issues are:

Using symbols to represent cells is not very efficient since a symbol is a kind of object, which means it is a pointer, which means it takes 32 or 64 bits of storage. 2 bits would be enough in our case.
Computer memory is linear (well, from a programmer's point of view), and 2-dimensional arrays are figments of our imagination. I mean, in the end it's still a 1-dimensional array, with some support from the programming language to make it appear as something else. Is it worth the trouble to implement the 2D world with a 1D array if the language (Common Lisp in this case) provides us with 2D arrays?
Currently a new array is allocated for each simulation step, which is not strictly necessary. Each simulation step needs only two arrays – for current world state and the next one. Since all cells of the new state are overwritten on each step, we don't need to create a completely new array to store them – we can reuse the one from previous step.

Specialised arrays

I'll start with changing the cell representation – instead of symbols I'll use numbers. And the world is going to be a suitable array. This is also a good time to tell a bit about arrays and their types to those unfamiliar with Common Lisp.

The arrays I used for the first implementation of Brian's Brain were of type T, which is the most generic array type possible, and it can store any kind of objects. It is also possible to specify the type of objects that will be stored in an array when creating it. And Common Lisp implementation will choose the most specific type of array that can store objects of specified type.

In my case I want to use array store numbers from 0 to 2. There is an integer type in Common Lisp which is the type of mathematical integer (with unlimited magnitude). It is also possible to limit the magnitude by specifying the lower and upper limit. Here's how to create an array that is suitable for storing integers from 0 to 2 in Common Lisp (SBCL):

1:  CL-USER> (make-array 7 :element-type '(integer 0 2))
2:  #(0 0 0 0 0 0 0)

We can also check what is the exact type of the object that has been created:

3:  CL-USER> (type-of (make-array 7 :element-type '(integer 0 2)))
4:  (SIMPLE-ARRAY (UNSIGNED-BYTE 2) (7))

As we can see, the type of the array created is (unsigned-byte 2), which means all non-negative integers representable by 2 bits (see the documentation of unsigned-byte for more details). The type returned is more general than what I have asked for, but as I said, it is up to Common Lisp implementation to choose what type of array to create. There is also a function that can tell what type of array will be created given a type specifier. Let's see, again on SBCL:

5:  CL-USER> (upgraded-array-element-type '(integer 0 2))
6:  (UNSIGNED-BYTE 2)
7:  CL-USER> (upgraded-array-element-type '(integer 0 42))
8:  (UNSIGNED-BYTE 7)

A different Common Lisp implementation is allowed to work differently. Here is what 32-bit Clozure CL tells us:

 9:  CL-USER> (upgraded-array-element-type '(integer 0 2))
10:  (UNSIGNED-BYTE 8)
11:  CL-USER> (upgraded-array-element-type '(integer 0 42))
12:  (UNSIGNED-BYTE 8)
13:  CL-USER> (type-of (make-array 7 :element-type '(integer 0 2)))
14:  (SIMPLE-ARRAY (UNSIGNED-BYTE 8) (7))

As you can see, in Clozure CL 8-bit-byte is the most specific array type available for arrays of small integers. There are many places in Common Lisp specification where implementations are given freedom of implementation choices. For instance, this specific issue is described in array upgrading section.

So now we are ready to work with numbers as cell representations. 0 for dead cell, 1 for alive cell, and 2 for dying cell:

15:  (defun make-brain (w h)
16:    (make-array (list h w) :element-type '(integer 0 2)))
17:  
18:  (defun make-initialised-brain (w h)
19:    (let ((cells (make-brain w h))
20:          (mid (floor w 2)))
21:      (setf (aref cells 0 mid) 1)
22:      (setf (aref cells 0 (1+ mid)) 1)
23:      cells))
24:  
25:  (defun rules (state neighbours)
26:    (case state
27:      (1 2)
28:      (2 0)
29:      (t (if (= 2 (count 1 neighbours)) 1 0))))

All other code stays as it was. Now is a good time to recall the odd graph from last time (scaled to match other graphs below):

And the new version:

Quite a nice improvement for such a small change. I guess my theory about GCing large arrays was correct. The times for Clozure CL have also improved, not quite that dramatically though:

What do we learn? Lisp programmers can learn the price of some things¹.

Destructive action

Next step is to avoid allocating a new array at each evolution step. I'm not expecting any serious performance gains from this because at this point each iteration is creating a single object that does not have to be traversed by GC. Let's see if I can shed some light on the believers in manual memory management lurking in the dark corners of intarwebs.

First I'll make evolve take another parameter – the array where the next generation cells will be put into:

30:  (defun evolve (src dst)
31:    (let* ((w (array-dimension src 1))
32:           (h (array-dimension src 0)))
33:      (loop for j below h
34:         do (loop for i below w
35:               do (setf (aref dst j i)
36:                        (funcall 'rules (aref src j i) (neighbours src i j)))))
37:      dst))

Easy, one parameter more, one line less. Now simulate will need two brains and will alternate them at each call to evolve:

38:  (defun simulate (steps a b)
39:    (loop repeat steps
40:       do (psetf a (evolve a b)
41:                 b a)
42:       finally (return a)))

benchmark also must be updated to pass brain (of proper size) to simulate:

43:  (defun benchmark ()
44:    (format *trace-output* "Benchmarking on ~A ~A~%"
45:            (lisp-implementation-type)
46:            (lisp-implementation-version))
47:    ;; Warmup.
48:    (simulate 10000 (make-initialised-brain 16 16))
49:    (loop
50:       for (w h i) in '((32    32  32768)
51:                        (64    64  8192)
52:                        (128  128  2048)
53:                        (256  256  512)
54:                        (512  512  128)
55:                        (1024 1024 32)
56:                        (2048 2048 8)
57:                        (4096 4096 2))
58:       do #+ccl (gc)
59:          #+sbcl (gc :full t)
60:          (let ((initial (make-initialised-brain w h))
61:                (spare (make-brain w h)))
62:            (format *trace-output* "*** ~Dx~D ~D iteration~:P ***~%" w h i)
63:            (time (simulate i initial spare))
64:            (finish-output *trace-output*)))
65:    (values))

Here, take a close look at the graphs now:

Notice anything interesting? Right, the performance has not improved noticeably. It has actually decreased slightly for SBCL and 64-bit Clozure CL. What's the lesson? Garbage Collector will make your life easier (if you don't deliberately spam it).

Dimension warp

The last exercise today will be to test whether it is worth the trouble of implementing 2-dimensional arrays on top of 1-dimensional arrays manually. This means doing the index calculations manually. I'll branch this experiment off the numerical branch².

Let's start with the easy things – creating the arrays:

66:  (defun make-brain (w h)
67:    (make-array (* w h) :element-type '(integer 0 2)))

All that has to be changed in make-initialised-brain is to remove the extra dimension (because we only change the first row, and first row in our array are first w cells):

68:  (defun make-initialised-brain (w h)
69:    (let ((cells (make-brain w h))
70:          (mid (floor w 2)))
71:      (setf (aref cells mid) 1)
72:      (setf (aref cells (1+ mid)) 1)
73:      cells))

The main function that has to be changed is neighbours. The function is very simple, although a bit lengthy (comparing to other incorrect versions), but I like to keep the intent of the code close to how I'd do things myself if I were pretending to be a computer:

74:  (defun neighbours (cells i w h)
75:    (let ((l (length cells))
76:          (mx (1- w))
77:          (my (1- h)))
78:      (multiple-value-bind (y x)
79:          (truncate i w)
80:        (flet ((up (i) (if (zerop y) (- (+ i l) w) (- i w)))
81:               (dn (i) (if (= y  my) (- (+ i w) l) (+ i w)))
82:               (lt (i) (if (zerop x) (1- (+ i w))  (1- i)))
83:               (rt (i) (if (= x  mx) (1+ (- i w))  (1+ i))))
84:          (let* ((u (up i))
85:                 (d (dn i))
86:                 (l (lt i))
87:                 (r (rt i))
88:                 (ul (lt u))
89:                 (ur (rt u))
90:                 (dl (lt d))
91:                 (dr (rt d)))
92:            (mapcar (lambda (i) (aref cells i))
93:                    (list ul u ur l r dl d dr)))))))

cells is the 1-dimensional array that is used to simulate the 2-dimensional array. i is the index of a cell for which the neighbours should be calculated. And since cells is a 1-dimensional array and does not have any information about what dimensions are stored in it, w and h specify the dimensions of 2-dimensional array. (If it is not clear to you why these dimensions must be specified think about how many different 2-dimensional arrays can be stored in 24-element 1-dimensional array).

I have written this function in a way that allows me to think about the 1D array as a 2D array. So from i and w the x and y coordinates are calculated. These are used to detect if the cell under question is on the top (0) row or leftmost (0) column. For bottom row and rightmost column I have my and mx, respectively.

Then come four local utility functions to determine an index that is up (up), down (dn), left (lt) or right (rt) of a given cell. The logic is the same as in the 2D array version.

Next I calculate indices for up, down, left and right cells using the above mentioned functions. The cells on diagonals are calculated relative to these. For instance, up-left cell can be obtained by going up from left cell or going left from up cell.

In the end the obtained indices are used to get cell values from cells array. And that's all there is to it.

Now evolve also can be simplified a bit since only one dimension has to be traversed. However, the brain dimensions must be specified explicitly since the dimensions of the brain cannot be obtained from the 1D array.

 94:  (defun evolve (src w h)
 95:    (let* ((l (length src))
 96:           (dst (make-brain w h)))
 97:      (loop for i below l
 98:         do (setf (aref dst i)
 99:                  (funcall 'rules (aref src i) (neighbours src i w h))))
100:      dst))

The dimensions must be dragged through simulate function, too:

101:  (defun simulate (steps initial w h)
102:    (loop with brain = initial
103:       repeat steps
104:       do (setf brain (funcall 'evolve brain w h))
105:       finally (return brain)))

The code gets ever uglier. Another way to solve this problem would be to create a container object (like a structure or class) to hold the cells array and the two dimensions. But that is really not the point of this exercise – I actually expect the gains from this change to be so minimal (if any!) that it is not worth the trouble at all.

OK, last change – brain dimensions must be passed to simulate:

106:  (defun benchmark ()
107:    (format *trace-output* "Benchmarking on ~A ~A~%"
108:            (lisp-implementation-type)
109:            (lisp-implementation-version))
110:    ;; Warmup.
111:    (simulate 10000 (make-initialised-brain 16 16) 16 16)
112:    (loop
113:       for (w h i) in '((32    32  32768)
114:                        (64    64  8192)
115:                        (128  128  2048)
116:                        (256  256  512)
117:                        (512  512  128)
118:                        (1024 1024 32)
119:                        (2048 2048 8)
120:                        (4096 4096 2))
121:       do #+ccl (gc)
122:          #+sbcl (gc :full t)
123:          (let ((initial (make-initialised-brain w h)))
124:            (format *trace-output* "*** ~Dx~D ~D iteration~:P ***~%" w h i)
125:            (time (simulate i initial w h))
126:            (finish-output *trace-output*)))
127:    (values))

So, let's see what do we get:

Oh man was I wrong! Why would the difference be so big? A profiler would come in handy, but before that I'd like to do another experiment because I have a feeling about the source of slowness: there are no declarations and generic array access code for 2D arrays might be more complex than for 1D arrays. Or maybe I should give profiler a shot?

Looking into future

The problem is that performance tweaking this brute-force solution seems a bit pointless. The point of these articles was to show how I would implement the Brian's Brain in Common Lisp (and do some apples-to-oranges comparisons to spice things up). A really interesting thing to try out would be to implement Hashlife, which promises an astronomic speed explosion, and see how it works with Brian's Brain. And I suspect that generating "6 octillion generations in 30 seconds" would not quite work out because Brian's Brain is a lot more chaotic (I have not noticed a single oscillator while watching my animations) – the presence of "dead" cells force a lot of movement (and expansion). I noticed quite a few spaceships, though.

I also installed Golly – a wonderful application to play with cellular automata. It has the Hashlife algorithm implemented, and it really does wonders. However it did not work out for Brian's Brain (starting with 2 live cells as I've been doing all this time) because:

In 30 seconds it does not even get to generation 2000.
The pattern is ever-expanding and I could not find any way to limit the world size (i.e., I think the torus-world is not implemented in Golly).

So I'd have to implement Hashlife myself. Except that I'm hesitant to hand out $30 for the original B. Gosper's paper, but might as well do it in the future (since as an extra bonus the original implementation was done on Lisp Machine).

So, I think I'll do one more iteration with the brute-force method (with profiling and declarations). And then we'll see what will happen.

Data files

Sources:

numeric.lisp
destructive.lisp
1d.lisp

Benchmarking transcripts:

bm-num-sbcl.txt
bm-num-ccl32.txt
bm-num-ccl64.txt
bm-des-sbcl.txt
bm-des-ccl32.txt
bm-des-ccl64.txt
bm-1d-sbcl.txt
bm-1d-ccl32.txt
bm-1d-ccl64.txt

Updates

2009-11-01: Please don't send me any more copies of B. Gosper's paper :)

Footnotes

¹ Alan Perlis, Epigrams in Programming, "55. A LISP programmer knows the value of everything, but the cost of nothing."

² Yes, I'm using git, and branching like crazy, even in this toy project.

2 comments:

Jianshi Huang18 November 2009 at 08:57
Hi Jānis,

Thanks for your posts on how to optimize clojure and common lisp code.

I've optimized your code heavily based on your previous post (simple.lisp). The code is optimized for SBCL. Please check it out if you are interested.

http://huangjs.net/code/brain.lisp

Cheers,
Jianshi
Ruben Berenguel13 December 2009 at 21:52
Hi,

I just read (diagonally) your post. Very interesting stuff on improving performance of Lisp code! Good work. I even stumbled it for a later, more patient read.

Thanks!

The beauty of grey

2009-10-29

Brian's brain, on Common Lisp, take 2