- Using symbols to represent cells is not very efficient since a symbol is a kind of object, which means it is a pointer, which means it takes 32 or 64 bits of storage. 2 bits would be enough in our case.
- Computer memory is linear (well, from a programmer's point of view), and 2-dimensional arrays are figments of our imagination. I mean, in the end it's still a 1-dimensional array, with some support from the programming language to make it appear as something else. Is it worth the trouble to implement the 2D world with a 1D array if the language (Common Lisp in this case) provides us with 2D arrays?
- Currently a new array is allocated for each simulation step, which is not strictly necessary. Each simulation step needs only two arrays – for current world state and the next one. Since all cells of the new state are overwritten on each step, we don't need to create a completely new array to store them – we can reuse the one from previous step.
I'll start with changing the cell representation – instead of symbols I'll use numbers. And the world is going to be a suitable array. This is also a good time to tell a bit about arrays and their types to those unfamiliar with Common Lisp.
The arrays I used for the first implementation of Brian's Brain were of type
T, which is the most
generic array type possible, and it can store any kind of objects. It is also possible to specify
the type of objects that will be stored in an array when creating it. And Common Lisp
implementation will choose the most specific type of array that can store objects of specified type.
In my case I want to use array store numbers from 0 to 2. There is an
integer type in Common Lisp
which is the type of mathematical integer (with unlimited magnitude). It is also possible to
limit the magnitude by specifying the lower and upper limit. Here's how to create an array that is
suitable for storing integers from 0 to 2 in Common Lisp (SBCL):
1: CL-USER> (make-array 7 :element-type '(integer 0 2)) 2: #(0 0 0 0 0 0 0)
We can also check what is the exact type of the object that has been created:
3: CL-USER> (type-of (make-array 7 :element-type '(integer 0 2))) 4: (SIMPLE-ARRAY (UNSIGNED-BYTE 2) (7))
As we can see, the type of the array created is
(unsigned-byte 2), which means all non-negative
integers representable by 2 bits (see the documentation of
unsigned-byte for more details).
The type returned is more general than what I have asked for, but as I said, it is up to Common Lisp
implementation to choose what type of array to create. There is also a function that can tell what
type of array will be created given a type specifier. Let's see, again on SBCL:
5: CL-USER> (upgraded-array-element-type '(integer 0 2)) 6: (UNSIGNED-BYTE 2) 7: CL-USER> (upgraded-array-element-type '(integer 0 42)) 8: (UNSIGNED-BYTE 7)
A different Common Lisp implementation is allowed to work differently. Here is what 32-bit Clozure CL tells us:
9: CL-USER> (upgraded-array-element-type '(integer 0 2)) 10: (UNSIGNED-BYTE 8) 11: CL-USER> (upgraded-array-element-type '(integer 0 42)) 12: (UNSIGNED-BYTE 8) 13: CL-USER> (type-of (make-array 7 :element-type '(integer 0 2))) 14: (SIMPLE-ARRAY (UNSIGNED-BYTE 8) (7))
As you can see, in Clozure CL 8-bit-byte is the most specific array type available for arrays of small integers. There are many places in Common Lisp specification where implementations are given freedom of implementation choices. For instance, this specific issue is described in array upgrading section.
So now we are ready to work with numbers as cell representations. 0 for dead cell, 1 for alive cell, and 2 for dying cell:
15: (defun make-brain (w h) 16: (make-array (list h w) :element-type '(integer 0 2))) 17: 18: (defun make-initialised-brain (w h) 19: (let ((cells (make-brain w h)) 20: (mid (floor w 2))) 21: (setf (aref cells 0 mid) 1) 22: (setf (aref cells 0 (1+ mid)) 1) 23: cells)) 24: 25: (defun rules (state neighbours) 26: (case state 27: (1 2) 28: (2 0) 29: (t (if (= 2 (count 1 neighbours)) 1 0))))
All other code stays as it was. Now is a good time to recall the odd graph from last time (scaled to match other graphs below):
And the new version:
Quite a nice improvement for such a small change. I guess my theory about GCing large arrays was correct. The times for Clozure CL have also improved, not quite that dramatically though:
What do we learn? Lisp programmers can learn the price of some things1.
Next step is to avoid allocating a new array at each evolution step. I'm not expecting any serious performance gains from this because at this point each iteration is creating a single object that does not have to be traversed by GC. Let's see if I can shed some light on the believers in manual memory management lurking in the dark corners of intarwebs.
First I'll make
evolve take another parameter – the array where the next generation cells will be
30: (defun evolve (src dst) 31: (let* ((w (array-dimension src 1)) 32: (h (array-dimension src 0))) 33: (loop for j below h 34: do (loop for i below w 35: do (setf (aref dst j i) 36: (funcall 'rules (aref src j i) (neighbours src i j))))) 37: dst))
Easy, one parameter more, one line less. Now
simulate will need two brains and will alternate
them at each call to
38: (defun simulate (steps a b) 39: (loop repeat steps 40: do (psetf a (evolve a b) 41: b a) 42: finally (return a)))
benchmark also must be updated to pass brain (of proper size) to
43: (defun benchmark () 44: (format *trace-output* "Benchmarking on ~A ~A~%" 45: (lisp-implementation-type) 46: (lisp-implementation-version)) 47: ;; Warmup. 48: (simulate 10000 (make-initialised-brain 16 16)) 49: (loop 50: for (w h i) in '((32 32 32768) 51: (64 64 8192) 52: (128 128 2048) 53: (256 256 512) 54: (512 512 128) 55: (1024 1024 32) 56: (2048 2048 8) 57: (4096 4096 2)) 58: do #+ccl (gc) 59: #+sbcl (gc :full t) 60: (let ((initial (make-initialised-brain w h)) 61: (spare (make-brain w h))) 62: (format *trace-output* "*** ~Dx~D ~D iteration~:P ***~%" w h i) 63: (time (simulate i initial spare)) 64: (finish-output *trace-output*))) 65: (values))
Here, take a close look at the graphs now:
Notice anything interesting? Right, the performance has not improved noticeably. It has actually decreased slightly for SBCL and 64-bit Clozure CL. What's the lesson? Garbage Collector will make your life easier (if you don't deliberately spam it).
The last exercise today will be to test whether it is worth the trouble of implementing 2-dimensional arrays on top of 1-dimensional arrays manually. This means doing the index calculations manually. I'll branch this experiment off the numerical branch2.
Let's start with the easy things – creating the arrays:
66: (defun make-brain (w h) 67: (make-array (* w h) :element-type '(integer 0 2)))
All that has to be changed in
make-initialised-brain is to remove the extra dimension (because we
only change the first row, and first row in our array are first
68: (defun make-initialised-brain (w h) 69: (let ((cells (make-brain w h)) 70: (mid (floor w 2))) 71: (setf (aref cells mid) 1) 72: (setf (aref cells (1+ mid)) 1) 73: cells))
The main function that has to be changed is
neighbours. The function is very simple, although a
bit lengthy (comparing to other incorrect versions), but I like to keep the intent of the code
close to how I'd do things myself if I were pretending to be a computer:
74: (defun neighbours (cells i w h) 75: (let ((l (length cells)) 76: (mx (1- w)) 77: (my (1- h))) 78: (multiple-value-bind (y x) 79: (truncate i w) 80: (flet ((up (i) (if (zerop y) (- (+ i l) w) (- i w))) 81: (dn (i) (if (= y my) (- (+ i w) l) (+ i w))) 82: (lt (i) (if (zerop x) (1- (+ i w)) (1- i))) 83: (rt (i) (if (= x mx) (1+ (- i w)) (1+ i)))) 84: (let* ((u (up i)) 85: (d (dn i)) 86: (l (lt i)) 87: (r (rt i)) 88: (ul (lt u)) 89: (ur (rt u)) 90: (dl (lt d)) 91: (dr (rt d))) 92: (mapcar (lambda (i) (aref cells i)) 93: (list ul u ur l r dl d dr)))))))
cells is the 1-dimensional array that is used to simulate the 2-dimensional array.
i is the
index of a cell for which the neighbours should be calculated. And since
cells is a 1-dimensional
array and does not have any information about what dimensions are stored in it,
the dimensions of 2-dimensional array. (If it is not clear to you why these dimensions must be
specified think about how many different 2-dimensional arrays can be stored in 24-element
I have written this function in a way that allows me to think about the 1D array as a 2D array. So
y coordinates are calculated. These are used to detect if the cell
under question is on the top (
0) row or leftmost (
0) column. For bottom row and rightmost
column I have
Then come four local utility functions to determine an index that is up (
up), down (
lt) or right (
rt) of a given cell. The logic is the same as in the 2D array version.
Next I calculate indices for up, down, left and right cells using the above mentioned functions. The cells on diagonals are calculated relative to these. For instance, up-left cell can be obtained by going up from left cell or going left from up cell.
In the end the obtained indices are used to get cell values from
cells array. And that's all
there is to it.
evolve also can be simplified a bit since only one dimension has to be traversed. However,
the brain dimensions must be specified explicitly since the dimensions of the brain cannot be
obtained from the 1D array.
94: (defun evolve (src w h) 95: (let* ((l (length src)) 96: (dst (make-brain w h))) 97: (loop for i below l 98: do (setf (aref dst i) 99: (funcall 'rules (aref src i) (neighbours src i w h)))) 100: dst))
The dimensions must be dragged through
simulate function, too:
101: (defun simulate (steps initial w h) 102: (loop with brain = initial 103: repeat steps 104: do (setf brain (funcall 'evolve brain w h)) 105: finally (return brain)))
The code gets ever uglier. Another way to solve this problem would be to create a container object (like a structure or class) to hold the cells array and the two dimensions. But that is really not the point of this exercise – I actually expect the gains from this change to be so minimal (if any!) that it is not worth the trouble at all.
OK, last change – brain dimensions must be passed to
106: (defun benchmark () 107: (format *trace-output* "Benchmarking on ~A ~A~%" 108: (lisp-implementation-type) 109: (lisp-implementation-version)) 110: ;; Warmup. 111: (simulate 10000 (make-initialised-brain 16 16) 16 16) 112: (loop 113: for (w h i) in '((32 32 32768) 114: (64 64 8192) 115: (128 128 2048) 116: (256 256 512) 117: (512 512 128) 118: (1024 1024 32) 119: (2048 2048 8) 120: (4096 4096 2)) 121: do #+ccl (gc) 122: #+sbcl (gc :full t) 123: (let ((initial (make-initialised-brain w h))) 124: (format *trace-output* "*** ~Dx~D ~D iteration~:P ***~%" w h i) 125: (time (simulate i initial w h)) 126: (finish-output *trace-output*))) 127: (values))
So, let's see what do we get:
Oh man was I wrong! Why would the difference be so big? A profiler would come in handy, but before that I'd like to do another experiment because I have a feeling about the source of slowness: there are no declarations and generic array access code for 2D arrays might be more complex than for 1D arrays. Or maybe I should give profiler a shot?
Looking into future
The problem is that performance tweaking this brute-force solution seems a bit pointless. The point of these articles was to show how I would implement the Brian's Brain in Common Lisp (and do some apples-to-oranges comparisons to spice things up). A really interesting thing to try out would be to implement Hashlife, which promises an astronomic speed explosion, and see how it works with Brian's Brain. And I suspect that generating "6 octillion generations in 30 seconds" would not quite work out because Brian's Brain is a lot more chaotic (I have not noticed a single oscillator while watching my animations) – the presence of "dead" cells force a lot of movement (and expansion). I noticed quite a few spaceships, though.
I also installed Golly – a wonderful application to play with cellular automata. It has the Hashlife algorithm implemented, and it really does wonders. However it did not work out for Brian's Brain (starting with 2 live cells as I've been doing all this time) because:
- In 30 seconds it does not even get to generation 2000.
- The pattern is ever-expanding and I could not find any way to limit the world size (i.e., I think the torus-world is not implemented in Golly).
So I'd have to implement Hashlife myself. Except that I'm hesitant to hand out $30 for the original B. Gosper's paper, but might as well do it in the future (since as an extra bonus the original implementation was done on Lisp Machine).
So, I think I'll do one more iteration with the brute-force method (with profiling and declarations). And then we'll see what will happen.
2009-11-01: Please don't send me any more copies of B. Gosper's paper :)