Being an attempt to write a computational fluid dynamics simulation using no more than two raised to the power ten bytes.
First, before I begin, N.B: apparently the extreme brevity required by this project has put me in rather a garrulous mood, so I apologize for the length of this post-mortem. If you're interested in maybe learning a few byte shaving tricks for your own js1k, though, read on. Also, verbose variable names will be used for the sake of clarity, though keep in mind that all variables here have a single letter identifier in the final program.
What it is
This project, written for the JS1K competition (demo here), is essentially a complete rewrite of an earlier, much more feature rich project. You can have a look at the demo, which is probably more interesting than this 1K version, but which is an order of magnitude larger in the bytes department. If you're not familiar with the lattice Boltzmann method, there's a fairly nice introduction here (pdf), or you can read my explanation. The TL;DR version, though, is that this is a 2D computational fluid dynamics simulation. You can think of it as the surface of a pond that you're dragging your finger through.
So what does this have to do with the contest's theme, 'here be dragons'? Well, nothing, really. It's just an interesting problem I was working on recently, and I thought it would be fun to see if it was possible to achieve in 1K. So let's get to it, shall we.
I started this project with a slightly modified version of the original program mentioned earlier. The original program clocked in just north of 20000 bytes, uncompressed, and the modified version was around about 4000 bytes. It fairly quickly became clear, though, that I wasn't going to be able to cut the size by three-quarters, and a fresh start was required. I wasn't quite so quick to accept this, though, and wasted a few commits trying to make it work. But once I had started fresh with just the core features (starting at about 1900 bytes), from there it was pretty rough sledding down to 1K. Strangely, as difficult as it was initially to trim away bytes, once I hit 1K the bytes seemed to keep melting off. The smallest size I achieved was 944 bytes, although this version was mostly an attempt to test the limits, and, while it 'worked', it had some serious issues. From this low-byte point, I began to add features and improve performance, while continuing to strip away any extra bytes I could. At this stage, the program oscillated between ~990 and ~1200 bytes. As I added each new feature, the size of the program would balloon up over 1K, and I then had to fret over whether to keep the feature, swap it for another one, or else find some other way to strip away a few spare bytes. If I can coin an analogy, the whole process was a bit like throwing tic-tacs and pennies out the window of your Chrysler K-car to make it go faster. Anyway, here's a little of what I learned.
The things I'm most proud/ashamed of, or, speedups and speedbumps:
Flatten all the things
One of the early significant breakthroughs came with flattening every possible nested loop. As the lattice Boltzmann
method makes use of a lattice (shocking, I know), it's perhaps most natural to represent this with a multi-dimensional
array. In this case, where we are working with a two-dimensional lattice, a two-dimensional array would be most
appropriate. In JavaScript, this would be represented by an n-length array, each element of which being an m-length
array, where n and m are the width and height respectively (in this program, width and height are equal, to save from
having to cache both the width and the height; further, it was decided that width and height would be 99, as this saves
a single byte vs a triple digit array size... seriously). Accessing this array would look like this: lattice[x][y]
.
The simplest way to loop over each of the elements of the lattice would be two nested loops, like so:
1 for (var x = 0; x < lattice_width; x++) {
2 for (var y = 0; y < lattice_height; y++) {
3 lattice[x][y];
4 }
5 }
The lattice Boltzmann method requires looping over the array at least twice per tick, once for the streaming phase, and once for the collision phase (although it may be possible to do it in a single loop with some additional storage on each node and some more complicated logic to shuffle around distributions, I never looked into it closely enough to determine the feasibility of this option, so I leave it as an exercise for the reader). However, while it is necessary to iterate over the array at least twice, these iterations needn't be performed with nested loops. It is possible to loop over any n-by-m array with a single loop. To do this, you loop from 0 to the total number of items in the the array (n*m), and determine the x and y coordinates on the fly, like so:
1 for (var i = 0; i < (lattice_width * lattice_height); i++) {
2 var x = i % lattice_width; // x-coordinate
3 var y = Math.floor(i / lattice_width); // y-coordinate
4 lattice[x][y];
5 }
If it's not clear why this works, think of it this way: y
is increasing by one every time we go through another
lattice_width
values. This corresponds exactly to the row numbers. And x
is cycling between 0 and lattice_width
,
which corresponds to the column numbers.
Although calculating the x and y coordinates does take a few extra bytes, the elimination of the inner loop more than makes up for this. For the most part, flattening these nested loops was fairly straightforward. However, I did have some problems flattening the draw loop. This loop iterates over a square region of the canvas image, and draws a colored square for each lattice node. Originally this loop looked like this:
1 for (var ypx = y_pos * px_per_node; ypx < (y_pos+1) * px_per_node; ypx++) {
2 for (var xpx = x_pos * px_per_node; xpx < (x_pos + 1) * px_per_node; xpx++) {
3 var index = (xpx + ypx * image.width) * 4;
4 image.data[index+0] = color.red;
5 image.data[index+1] = color.green;
6 image.data[index+2] = color.blue;
7 image.data[index+3] = color.alpha;
8 }
9 }
The additional logic in each loop initialization and condition, and in calculating the index, made this one a little more difficult to figure out. At its core, though, this loop is merely iterating over a square region of the image. Since I decided to fix the width and height of the image that's drawn to the canvas (which also simplified many other areas of the program and saved quite a few bytes), this loop eventually boiled down to this:
1 for (var i = 0; i < 36; i++) {
2 var x = i % 6;
3 var y = Math.floor(i / 6);
4 var index = 4 * (x + (6 * x_pos) + image.width * (y + (6 * y_pos)));
5 }
where 36 is the fixed area of the square to be drawn, and 6 is the width and height of that square (AKA px_per_node
).
You may recognize the method of calculating x and y from earlier. The rest of the logic merely calculates the image
index, and is an implementation detail of the image data array. When I was working on this late at night, this extra
logic confounded the problem immensely, but after breaking it down into its essential components it became clear enough.
A related trick that gained a few extra bytes was to flatten the two dimensional array representing the lattice into a
one-dimensional array. This complicates lookup slightly (each node is accessed via lattice[x+y*width]
instead of
lattice[x][y]
), and you may notice that, even when the width variable is squashed to a single letter, lookup with this
method actually takes one more byte. The small extra lookup cost was worth it, however, as I was able to eliminate an
if
test during initialization, which was checking to see if the initialization loop had reached a new column in the
array, and adding a new array if it had (which looked like this):
1 if (y_pos==0){
2 lattice[x_pos]=[];
3 }
Flows of data more vast than anything the world has seen
If you're trying to shave bytes, one of the most important things is efficient data storage. If you can eliminate the
need for data storage altogether (e.g. with procedural generation), so much the better. But for this project, there was
a small amount of data that was absolutely critical, and which, so far as I can tell, cannot be succinctly
programmatically generated. These were, namely, the velocities associated with each distribution function (DF) and the
distribution function weights. In the lattice Boltzmann method, each node has a number of distribution functions, each
representing a distribution of particle densities. In this program (which uses the D2Q9 discretization... which just
means a two-dimensional lattice, with 9 velocities per node), each node has nine of these distribution functions.
These are numbered from zero to eight. Zero represents the 'at-rest' velocity, one through four represent the cardinal
direction velocities, and five through eight represent the ordinal direction velocities. In order for these
distributions to stream in their direction of travel (i.e. move from one node another), each must have some notion of
what that direction is. Originally I had stored these directions as an 8-by-2 array, where each inner array represented
the delta x and y for its respective distribution to travel. This looked something like this:
ND = [[0,0],[1,0],[0,-1],[-1,0],[0,1],[1,-1],[-1,-1],[-1,1],[1,1]]
. So, for example, the '6' distribution travels -1
in the x direction, and -1 in the y direction. For a node at coordinates (100,100), after streaming the '6' distribution
from (100,100) would end up at coordinates (99,99). You may notice that the data above contains a fair number of
non-data characters in the form of brackets and commas. Altogether, this array takes up 61 bytes. My initial solution to
reduce the size of this data, which was one of my largest blunders on this project, was to represent this data as a
string. I will give you a moment to allow that to sink in. If it strikes you that this is an utterly inane solution, you
are correct, but I think there's a lesson to be learned here, so let's have a look at how I came to it. With this
method, the data would look like this ND = " 0 0 1 0 0-1-1 0 0 1 1-1-1-1-1 1 1 1"
. Note the extra spaces, which are
used to pad non negative numbers to a string length of 2. This makes lookup much simpler. Now granted, this storage
method is much more compact (it's only 38 bytes), but lookup is more complicated and verbose. It would look something
like this ND.slice(x*4,x*4+2),ND.slice(x*4+2,x*4+4)
, where x is the distribution we are looking at from 0 to 8.
Although slice
might seem quite expensive, when I was employing this method I was caching the string 'slice' once and
using bracket notation, like this: X='slice';ND[X](x*4,x*4+2)
, so overall it wasn't too terrible, and the extra bytes
needed for lookup were still made up by the relative compactness of the storage method. However, a quick
jsperf revealed that this method was over 90% slower than a simple
array lookup. Perhaps some of you have by now come to the realization that took me far, far too long. This data can be
stored in a single dimensional array for a few extra bytes (43 total), but with the dual benefits of significantly
faster lookup time, and of saving several bytes per lookup (ND[x*2]
vs ND[X](x*4,x*4+2)
; keep in mind that the
latter example would almost certainly have cached x*4
, and in reality would be ND[X](x,x+2)
). Additionally, another
10 bytes are saved by not having to cache 'slice'. Perhaps the worst part of this whole ordeal is how inordinately
clever I thought I was being at the time. So the lesson here, if there is one, is that you're almost certainly never as
clever as you think you are. If you get myopia about a problem and lock into your initial solution, you can close
yourself off to the easier, more elegant solution.
There was also one more savings on data. Each DF has a weight associated with it. Zero has its own weight
(one9th=1/9
), the cardinal directions share another weight (four9ths=4/9
), and the ordinal directions share yet
another (one36th=1/36
). Originally I was storing these velocities in their own array. Of course this was very costly.
Even though tacking them onto the node directions array saved a few extra bytes, there was still a lot of unnecessary
repetition (four9ths
and one36th
were each stored in four separate locations, each of those instances requiring an
extra comma in the array). Since this data is only used once, in the equilibrium function, it is hard to justify
spending so many bytes on storage. So it was a very obvious candidate for some form of simplification or compression.
The most succinct method I was able to find was to use an if/else statement inside the loop over the DFs
(i.e. from 0 to 8), which fails if zero (thus setting else weight
to 4/9), and otherwise evaluates a conditional
operator. If we are looking at velocities one through four, the weight is 1/9, otherwise it is 1/36. So like this:
1 for (var i = 0; i < 9; i++) {
2 if (i) {
3 weight = (i<5)?1/9:1/36;
4 } else {
5 weight = 4/9;
6 }
7 }
Google's closure compiler further compacts this piece of logic in a way I myself probably wouldn't have thought of
(weight=i?5>i?1/9:1/36:4/9
). Overall a significant byte savings was realized over storing this data in its own array.
Sound trumpets! let our bloody colours wave!
This is a brief point, but I think it's an important one. Visually, the program originally drew green 'waves' on a black background. In my opinion, this doesn't look too bad. Working with any more colors than this was pretty much off the table, as it would not have been within my byte budget, and there were many more things besides that would have taken precedence if I had the bytes to spare. For a long time, I was achieving this effect by setting the background style property of the canvas to black (at a cost of 26 bytes). Eventually, though, it became clear that this was too expensive, and would have to go. So for a while I was drawing green waves on a white background. This looked... less nice. I experimented with different colored waves—red, black, blue, everything... light blue—still on a white background, but they were all lacking. Eventually I realized that I could achieve the exact same effect as I had been without using any extra bytes. I had been setting the green channel of each pixel to 255 (well, 600, actually, as I already had 600 cached in a variable anyway, to use for the width and height of the image... setting this well above the allowed maximum hasn't seemed to have any ill effects, and 2 bytes is 2 bytes), and varying the alpha channel based on speed. This has the effect of drawing each lattice node green, with the alpha channel being proportional to the speed at that node. Like this:
1 image.data[index+1] = 255; // Green channel
2 image.data[index+3] = speed; // Alpha channel
Using this method, if a node has low or no speed, it is essentially transparent. In other words, the background color
can and will shine through. Eventually I came to realize that if I swapped these (in other words, set the alpha channel
to a constant (i.e. fully opaque), and vary the green channel proportionally with the speed at the node), that I could
achieve the same green on black effect I had wanted, but at absolutely no additional cost over what I was already using.
This works primarily because the different channels are defaulted to 0. So when speed is 0, the node would be colored
{r: 0, g: 0, b: 0, a: 255}
(black), and when speed is high, it would be {r: 0, g: 255, b: 0, a: 255}
(green).
Somewhat embarrassingly, this realization came when trying random color combinations. It was not a stroke of insight so
much as it was a happy accident. The takeaway here being, if something isn't working for you, there very well may be
more than one way to achieve the same effect. Don't give up on something just because your first attempt failed or was
too costly. Keep experimenting. You're almost certainly not so smart that you can't stumble your way into something, at
some point, that you couldn't have thought your way into.
The point of no return
s
This was a small-ish savings, but if you look at my program, you may notice that there isn't a single return statement.
Not one. After all, return
is pretty costly. To return anything meaningful requires at least nine bytes, plus two more
for assignment upon calling. Streaming and collision don't really require a return
(they both manipulate the lattice
array which is in the global scope). The mouse function doesn't require one. The equilibrium function originally did
return the equilibrium array, but ultimately it didn't require one either. To achieve this, an eq
array was put into
the global scope. When it is necessary to calculate the equilibrium of a node, the equilibrium
function is called on
it's own, and on the next line the equilibrium values are accessed from the now updated eq
variable. In other words,
it's all about the side-effects. In the following example, the equilibrium values are calculated using the node's
density and fixed x and y velocity values (0.1); the node's streaming array (which is just a place to store streaming
values, to save us having to throw out and rebuild every single node on every single tick) is then set to the recently
calculated equilibrium values.
1 // Here eq holds outdated values
2 equilibrium(0.1, 0.1, node.density); // eq values are up to date after this executes
3 node.stream = eq; // Set streaming array to new equilibrium values
The one important thing to note here is that, since the eq
variable is shared quite promiscuously, it is imperative
that a new array be created each and every time the equilibrium value is calculated, otherwise every node would share a
reference to the same array, and madness would ensue. This is the inherent danger of mutable state, but there's no way
around it if you want to trim some bytes.
I ain't got time to var
If you look carefully at the fully minified version of this program, you may notice something interesting. Every single
variable is in the global scope. Every. Single. One. Well... aside from the arguments to the equilibrium
and
mousemove
functions. But the point is, there isn't a single instance of var
in the entire program. Now, Google's
closure compiler doesn't rename globals, as this could cause serious issues, so this was one of the more difficult
optimizations to achieve. Since the compiler would not have any qualms about using, for example, the local variable
identifier a
in two separate functions (as they would each belong to their own scope, neither would ever be in danger
of overwriting the data of the other), it isn't possible to simply delete all var
's from the compiled program without
risking dangerous name collisions. While this may work coincidentally in certain instances, it's not a technique that
can be consistently relied upon. So in order to eliminate all local variables, I had to manually rename virtually all
variables to single letter identifiers, making certain that there were no name clashes. This was particularly difficult,
as it required some careful bookkeeping to keep the different variables straight. It is probably best to perform this
step as late as possible, once your program is functioning properly and the functionality has been more or less locked
down, as once this is performed your program will become significantly more difficult to understand and follow, even in
its uncompiled state. Before you reach this step, though, you can make this process much easier for yourself by only
using unique, easily searchable, and easily mechanically replacable identifiers for all your different variables. So as
an example, having the variables lattice
and lattice_width
could present problems if you were to search/replace
lattice
before lattice_width
. Also, loop variables can generally be reused with impunity, although it's best to
perform a quick sanity check to make certain there won't ever be conflicts when reusing them before renaming. As an
example, if you are using the loop variable i
in both your update
function and your stream
function, if update
were to call stream
inside of a loop using this i
variable, this could lead to problems. One other thing that was
helpful was to put the entire program in an immediately invoked function expression. This provides your program with its
own scope, and allows Google's closure compiler to freely rename variables therein.
requestAnimationFrame
, captain
Finally, it's important to know when you can't get away with trimming bytes. For a long time, I was using setTimeout
instead of either the much superior but much more verbose requestAnimationFrame
or the probably not much better but
slightly more verbose setInterval
. I thought this was an easy 11 bytes. It worked pretty well in chrome, which is what
I was primarily developing in, but it put firefox into an absolute fit. When it came time to test my program in Firefox,
it took me a while to determine what the actual issue was, and I wasted a fair amount of time chasing red herrings. When
I finally realized that requestAnimationFrame
was not optional, it was pretty rough. The program was hovering right
around 1K, and requestAnimationFrame
sent it over the top. I had to make some hard decisions as a result, and I had to
dig in even deeper to keep shaving off more bytes. If I had started with requestAnimationFrame
, I still would have had
to shave the same number of bytes, but I might have saved myself a mini heartache of going from ~1K up into the 1040
range. That was pretty demoralizing. Now, I'm not saying every program absolutely needs to use requestAnimationFrame
.
I believe it was only essential to this program because of the relatively high computational complexity of the
algorithm. I'm sure a less taxing program could easily get away with setInterval
. The important takeaway here, though,
is that, whenever possible, you should find those things that are absolutely essential to your program as early as you
can, and make them nonnegotiable in your byte-budget. So if you determine early on that you absolutely need
requestAnimationFrame
and a.onmousemove=function(){}
, then you really only have 977 bytes to play with, not 1024.
Odds and bobs
A few random bits of advice:
-
Set up a build/compile process early, especially if you're using a mechanical minification service like Google's closure compiler. They have an easy to use API, and a little regex knowledge should do the rest. I manually search/replaced variables and things far more often than I really should have before I set up a compile script. I set mine up to send to Google's closure compiler, knock some variables off the response that I only kept around to keep the closure compiler from using those identifiers, wipe out the IIFE I was using for scoping, and then finally print the final length to the terminal. There were a few more steps I never bothered to automate, too, so I always knew that the reported length was going to be ~10 bytes higher than the fully minified program.
-
If you dig through my repo, you'll find one commit message made early in the morning that reads "Had a few beers, somehow gained 36 bytes?!?; 1100 bytes". This was prior to reaching 1K, and at the time it was a pretty significant step towards that goal. It wasn't even that many beers, either. Who knows how many bytes could have been shed if it had been tequila. So, I guess the point is, that it can't hurt to get a little drunk? Maybe? Or even just step away for a minute, take a walk, do whatever you need to to take your mind off the problem, and you might return to it with a new perspective.
Conclusion, or whatever
Though this project certainly had its low points, and there were a few moments when I seriously questioned whether what I had set out to do was even possible, ultimately I achieved my goal: I made a program that was far more functional and polished than I would have thought possible in such a small number of bytes, I learned a few interesting and useful things about JavaScript (and a couple of dirty, dirty hacks), and I got a different perspective on programming in general.
The French Oulipo are a group of writers and mathematicians who are primarily known for their constrained writing. Georges Perec, for example, wrote a 300 page lipogram novel, La disparition, in which the letter 'e' is never used. Like, at all. Ever. In French, the letter 'e' has a slightly higher frequency than it does in English, so this was no small feat. While it can be maddening at times, as the Oulipo realized, imposing an arbitrary constraint upon yourself can force you to focus more clearly on what you are doing, and can lead to brilliant insights you might never have realized otherwise, as well as a deeper understanding of the boundaries of the system you're working in.