Computational Fluid Dynamics in Under 1K: How I Made My JS1K Entry

Posted on February 17, 2014 by ebenpack

Being an attempt to write a computational fluid dynamics simulation using no more than two raised to the power ten bytes.

First, before I begin, N.B: apparently the extreme brevity required by this project has put me in rather a garrulous mood, so I apologize for the length of this post-mortem. If you’re interested in maybe learning a few byte shaving tricks for your own js1k, though, read on. Also, verbose variable names will be used for the sake of clarity, though keep in mind that all variables here have a single letter identifier in the final program.

What it is

This project, written for the JS1K competition (demo here), is essentially a complete rewrite of an earlier, much more feature rich project. You can have a look at the demo, which is probably more interesting than this 1K version, but which is an order of magnitude larger in the bytes department. If you’re not familiar with the lattice Boltzmann method, there’s a fairly nice introduction here (pdf), or you can read my explanation. The TL;DR version, though, is that this is a 2D computational fluid dynamics simulation. You can think of it as the surface of a pond that you’re dragging your finger through.

So what does this have to do with the contest’s theme, ‘here be dragons’? Well, nothing, really. It’s just an interesting problem I was working on recently, and I thought it would be fun to see if it was possible to achieve in 1K. So let’s get to it, shall we.

I started this project with a slightly modified version of the original program mentioned earlier. The original program clocked in just north of 20000 bytes, uncompressed, and the modified version was around about 4000 bytes. It fairly quickly became clear, though, that I wasn’t going to be able to cut the size by three-quarters, and a fresh start was required. I wasn’t quite so quick to accept this, though, and wasted a few commits trying to make it work. But once I had started fresh with just the core features (starting at about 1900 bytes), from there it was pretty rough sledding down to 1K. Strangely, as difficult as it was initially to trim away bytes, once I hit 1K the bytes seemed to keep melting off. The smallest size I achieved was 944 bytes, although this version was mostly an attempt to test the limits, and, while it ‘worked’, it had some serious issues. From this low-byte point, I began to add features and improve performance, while continuing to strip away any extra bytes I could. At this stage, the program oscillated between ~990 and ~1200 bytes. As I added each new feature, the size of the program would balloon up over 1K, and I then had to fret over whether to keep the feature, swap it for another one, or else find some other way to strip away a few spare bytes. If I can coin an analogy, the whole process was a bit like throwing tic-tacs and pennies out the window of your Chrysler K-car to make it go faster. Anyway, here’s a little of what I learned.

The things I’m most proud/ashamed of, or, speedups and speedbumps:

Flatten all the things

One of the early significant breakthroughs came with flattening every possible nested loop. As the lattice Boltzmann method makes use of a lattice (shocking, I know), it’s perhaps most natural to represent this with a multi-dimensional array. In this case, where we are working with a two-dimensional lattice, a two-dimensional array would be most appropriate. In JavaScript, this would be represented by an n-length array, each element of which being an m-length array, where n and m are the width and height respectively (in this program, width and height are equal, to save from having to cache both the width and the height; further, it was decided that width and height would be 99, as this saves a single byte vs a triple digit array size… seriously). Accessing this array would look like this: lattice[x][y]. The simplest way to loop over each of the elements of the lattice would be two nested loops, like so:

The lattice Boltzmann method requires looping over the array at least twice per tick, once for the streaming phase, and once for the collision phase (although it may be possible to do it in a single loop with some additional storage on each node and some more complicated logic to shuffle around distributions, I never looked into it closely enough to determine the feasibility of this option, so I leave it as an exercise for the reader). However, while it is necessary to iterate over the array at least twice, these iterations needn’t be performed with nested loops. It is possible to loop over any n-by-m array with a single loop. To do this, you loop from 0 to the total number of items in the the array (n*m), and determine the x and y coordinates on the fly, like so:

If it’s not clear why this works, think of it this way: y is increasing by one every time we go through another lattice_width values. This corresponds exactly to the row numbers. And x is cycling between 0 and lattice_width, which corresponds to the column numbers.

Although calculating the x and y coordinates does take a few extra bytes, the elimination of the inner loop more than makes up for this. For the most part, flattening these nested loops was fairly straightforward. However, I did have some problems flattening the draw loop. This loop iterates over a square region of the canvas image, and draws a colored square for each lattice node. Originally this loop looked like this:

The additional logic in each loop initialization and condition, and in calculating the index, made this one a little more difficult to figure out. At its core, though, this loop is merely iterating over a square region of the image. Since I decided to fix the width and height of the image that’s drawn to the canvas (which also simplified many other areas of the program and saved quite a few bytes), this loop eventually boiled down to this:

where 36 is the fixed area of the square to be drawn, and 6 is the width and height of that square (AKA px_per_node). You may recognize the method of calculating x and y from earlier. The rest of the logic merely calculates the image index, and is an implementation detail of the image data array. When I was working on this late at night, this extra logic confounded the problem immensely, but after breaking it down into its essential components it became clear enough.

A related trick that gained a few extra bytes was to flatten the two dimensional array representing the lattice into a one-dimensional array. This complicates lookup slightly (each node is accessed via lattice[x+y*width] instead of lattice[x][y]), and you may notice that, even when the width variable is squashed to a single letter, lookup with this method actually takes one more byte. The small extra lookup cost was worth it, however, as I was able to eliminate an if test during initialization, which was checking to see if the initialization loop had reached a new column in the array, and adding a new array if it had (which looked like this):

Flows of data more vast than anything the world has seen

If you’re trying to shave bytes, one of the most important things is efficient data storage. If you can eliminate the need for data storage altogether (e.g. with procedural generation), so much the better. But for this project, there was a small amount of data that was absolutely critical, and which, so far as I can tell, cannot be succinctly programmatically generated. These were, namely, the velocities associated with each distribution function (DF) and the distribution function weights. In the lattice Boltzmann method, each node has a number of distribution functions, each representing a distribution of particle densities. In this program (which uses the D2Q9 discretization… which just means a two-dimensional lattice, with 9 velocities per node), each node has nine of these distribution functions. These are numbered from zero to eight. Zero represents the ‘at-rest’ velocity, one through four represent the cardinal direction velocities, and five through eight represent the ordinal direction velocities. In order for these distributions to stream in their direction of travel (i.e. move from one node another), each must have some notion of what that direction is. Originally I had stored these directions as an 8-by-2 array, where each inner array represented the delta x and y for its respective distribution to travel. This looked something like this: ND = [[0,0],[1,0],[0,-1],[-1,0],[0,1],[1,-1],[-1,-1],[-1,1],[1,1]]. So, for example, the ‘6’ distribution travels -1 in the x direction, and -1 in the y direction. For a node at coordinates (100,100), after streaming the ‘6’ distribution from (100,100) would end up at coordinates (99,99). You may notice that the data above contains a fair number of non-data characters in the form of brackets and commas. Altogether, this array takes up 61 bytes. My initial solution to reduce the size of this data, which was one of my largest blunders on this project, was to represent this data as a string. I will give you a moment to allow that to sink in. If it strikes you that this is an utterly inane solution, you are correct, but I think there’s a lesson to be learned here, so let’s have a look at how I came to it. With this method, the data would look like this ND = " 0 0 1 0 0-1-1 0 0 1 1-1-1-1-1 1 1 1". Note the extra spaces, which are used to pad non negative numbers to a string length of 2. This makes lookup much simpler. Now granted, this storage method is much more compact (it’s only 38 bytes), but lookup is more complicated and verbose. It would look something like this ND.slice(x*4,x*4+2),ND.slice(x*4+2,x*4+4), where x is the distribution we are looking at from 0 to 8. Although slice might seem quite expensive, when I was employing this method I was caching the string ‘slice’ once and using bracket notation, like this: X='slice';ND[X](x*4,x*4+2), so overall it wasn’t too terrible, and the extra bytes needed for lookup were still made up by the relative compactness of the storage method. However, a quick jsperf revealed that this method was over 90% slower than a simple array lookup. Perhaps some of you have by now come to the realization that took me far, far too long. This data can be stored in a single dimensional array for a few extra bytes (43 total), but with the dual benefits of significantly faster lookup time, and of saving several bytes per lookup (ND[x*2] vs ND[X](x*4,x*4+2); keep in mind that the latter example would almost certainly have cached x*4, and in reality would be ND[X](x,x+2)). Additionally, another 10 bytes are saved by not having to cache ‘slice’. Perhaps the worst part of this whole ordeal is how inordinately clever I thought I was being at the time. So the lesson here, if there is one, is that you’re almost certainly never as clever as you think you are. If you get myopia about a problem and lock into your initial solution, you can close yourself off to the easier, more elegant solution.

There was also one more savings on data. Each DF has a weight associated with it. Zero has its own weight (one9th=1/9), the cardinal directions share another weight (four9ths=4/9), and the ordinal directions share yet another (one36th=1/36). Originally I was storing these velocities in their own array. Of course this was very costly. Even though tacking them onto the node directions array saved a few extra bytes, there was still a lot of unnecessary repetition (four9ths and one36th were each stored in four separate locations, each of those instances requiring an extra comma in the array). Since this data is only used once, in the equilibrium function, it is hard to justify spending so many bytes on storage. So it was a very obvious candidate for some form of simplification or compression. The most succinct method I was able to find was to use an if/else statement inside the loop over the DFs (i.e. from 0 to 8), which fails if zero (thus setting else weight to 4/9), and otherwise evaluates a conditional operator. If we are looking at velocities one through four, the weight is 1/9, otherwise it is 1/36. So like this:

Google’s closure compiler further compacts this piece of logic in a way I myself probably wouldn’t have thought of (weight=i?5>i?1/9:1/36:4/9). Overall a significant byte savings was realized over storing this data in its own array.

Sound trumpets! let our bloody colours wave!

This is a brief point, but I think it’s an important one. Visually, the program originally drew green ‘waves’ on a black background. In my opinion, this doesn’t look too bad. Working with any more colors than this was pretty much off the table, as it would not have been within my byte budget, and there were many more things besides that would have taken precedence if I had the bytes to spare. For a long time, I was achieving this effect by setting the background style property of the canvas to black (at a cost of 26 bytes). Eventually, though, it became clear that this was too expensive, and would have to go. So for a while I was drawing green waves on a white background. This looked… less nice. I experimented with different colored waves—red, black, blue, everything… light blue—still on a white background, but they were all lacking. Eventually I realized that I could achieve the exact same effect as I had been without using any extra bytes. I had been setting the green channel of each pixel to 255 (well, 600, actually, as I already had 600 cached in a variable anyway, to use for the width and height of the image… setting this well above the allowed maximum hasn’t seemed to have any ill effects, and 2 bytes is 2 bytes), and varying the alpha channel based on speed. This has the effect of drawing each lattice node green, with the alpha channel being proportional to the speed at that node. Like this:

Using this method, if a node has low or no speed, it is essentially transparent. In other words, the background color can and will shine through. Eventually I came to realize that if I swapped these (in other words, set the alpha channel to a constant (i.e. fully opaque), and vary the green channel proportionally with the speed at the node), that I could achieve the same green on black effect I had wanted, but at absolutely no additional cost over what I was already using. This works primarily because the different channels are defaulted to 0. So when speed is 0, the node would be colored {r: 0, g: 0, b: 0, a: 255} (black), and when speed is high, it would be {r: 0, g: 255, b: 0, a: 255} (green). Somewhat embarrassingly, this realization came when trying random color combinations. It was not a stroke of insight so much as it was a happy accident. The takeaway here being, if something isn’t working for you, there very well may be more than one way to achieve the same effect. Don’t give up on something just because your first attempt failed or was too costly. Keep experimenting. You’re almost certainly not so smart that you can’t stumble your way into something, at some point, that you couldn’t have thought your way into.

The point of no returns

This was a small-ish savings, but if you look at my program, you may notice that there isn’t a single return statement. Not one. After all, return is pretty costly. To return anything meaningful requires at least nine bytes, plus two more for assignment upon calling. Streaming and collision don’t really require a return (they both manipulate the lattice array which is in the global scope). The mouse function doesn’t require one. The equilibrium function originally did return the equilibrium array, but ultimately it didn’t require one either. To achieve this, an eq array was put into the global scope. When it is necessary to calculate the equilibrium of a node, the equilibrium function is called on it’s own, and on the next line the equilibrium values are accessed from the now updated eq variable. In other words, it’s all about the side-effects. In the following example, the equilibrium values are calculated using the node’s density and fixed x and y velocity values (0.1); the node’s streaming array (which is just a place to store streaming values, to save us having to throw out and rebuild every single node on every single tick) is then set to the recently calculated equilibrium values.

The one important thing to note here is that, since the eq variable is shared quite promiscuously, it is imperative that a new array be created each and every time the equilibrium value is calculated, otherwise every node would share a reference to the same array, and madness would ensue. This is the inherent danger of mutable state, but there’s no way around it if you want to trim some bytes.

I ain’t got time to var

If you look carefully at the fully minified version of this program, you may notice something interesting. Every single variable is in the global scope. Every. Single. One. Well… aside from the arguments to the equilibrium and mousemove functions. But the point is, there isn’t a single instance of var in the entire program. Now, Google’s closure compiler doesn’t rename globals, as this could cause serious issues, so this was one of the more difficult optimizations to achieve. Since the compiler would not have any qualms about using, for example, the local variable identifier a in two separate functions (as they would each belong to their own scope, neither would ever be in danger of overwriting the data of the other), it isn’t possible to simply delete all var’s from the compiled program without risking dangerous name collisions. While this may work coincidentally in certain instances, it’s not a technique that can be consistently relied upon. So in order to eliminate all local variables, I had to manually rename virtually all variables to single letter identifiers, making certain that there were no name clashes. This was particularly difficult, as it required some careful bookkeeping to keep the different variables straight. It is probably best to perform this step as late as possible, once your program is functioning properly and the functionality has been more or less locked down, as once this is performed your program will become significantly more difficult to understand and follow, even in its uncompiled state. Before you reach this step, though, you can make this process much easier for yourself by only using unique, easily searchable, and easily mechanically replacable identifiers for all your different variables. So as an example, having the variables lattice and lattice_width could present problems if you were to search/replace lattice before lattice_width. Also, loop variables can generally be reused with impunity, although it’s best to perform a quick sanity check to make certain there won’t ever be conflicts when reusing them before renaming. As an example, if you are using the loop variable i in both your update function and your stream function, if update were to call stream inside of a loop using this i variable, this could lead to problems. One other thing that was helpful was to put the entire program in an immediately invoked function expression. This provides your program with its own scope, and allows Google’s closure compiler to freely rename variables therein.

requestAnimationFrame, captain

Finally, it’s important to know when you can’t get away with trimming bytes. For a long time, I was using setTimeout instead of either the much superior but much more verbose requestAnimationFrame or the probably not much better but slightly more verbose setInterval. I thought this was an easy 11 bytes. It worked pretty well in chrome, which is what I was primarily developing in, but it put firefox into an absolute fit. When it came time to test my program in Firefox, it took me a while to determine what the actual issue was, and I wasted a fair amount of time chasing red herrings. When I finally realized that requestAnimationFrame was not optional, it was pretty rough. The program was hovering right around 1K, and requestAnimationFrame sent it over the top. I had to make some hard decisions as a result, and I had to dig in even deeper to keep shaving off more bytes. If I had started with requestAnimationFrame, I still would have had to shave the same number of bytes, but I might have saved myself a mini heartache of going from ~1K up into the 1040 range. That was pretty demoralizing. Now, I’m not saying every program absolutely needs to use requestAnimationFrame. I believe it was only essential to this program because of the relatively high computational complexity of the algorithm. I’m sure a less taxing program could easily get away with setInterval. The important takeaway here, though, is that, whenever possible, you should find those things that are absolutely essential to your program as early as you can, and make them nonnegotiable in your byte-budget. So if you determine early on that you absolutely need requestAnimationFrame and a.onmousemove=function(){}, then you really only have 977 bytes to play with, not 1024.

Odds and bobs

A few random bits of advice:

  • Set up a build/compile process early, especially if you’re using a mechanical minification service like Google’s closure compiler. They have an easy to use API, and a little regex knowledge should do the rest. I manually search/replaced variables and things far more often than I really should have before I set up a compile script. I set mine up to send to Google’s closure compiler, knock some variables off the response that I only kept around to keep the closure compiler from using those identifiers, wipe out the IIFE I was using for scoping, and then finally print the final length to the terminal. There were a few more steps I never bothered to automate, too, so I always knew that the reported length was going to be ~10 bytes higher than the fully minified program.

  • If you dig through my repo, you’ll find one commit message made early in the morning that reads “Had a few beers, somehow gained 36 bytes?!?; 1100 bytes”. This was prior to reaching 1K, and at the time it was a pretty significant step towards that goal. It wasn’t even that many beers, either. Who knows how many bytes could have been shed if it had been tequila. So, I guess the point is, that it can’t hurt to get a little drunk? Maybe? Or even just step away for a minute, take a walk, do whatever you need to to take your mind off the problem, and you might return to it with a new perspective.

Conclusion, or whatever

Though this project certainly had its low points, and there were a few moments when I seriously questioned whether what I had set out to do was even possible, ultimately I achieved my goal: I made a program that was far more functional and polished than I would have thought possible in such a small number of bytes, I learned a few interesting and useful things about JavaScript (and a couple of dirty, dirty hacks), and I got a different perspective on programming in general.

The French Oulipo are a group of writers and mathematicians who are primarily known for their constrained writing. Georges Perec, for example, wrote a 300 page lipogram novel, La disparition, in which the letter ‘e’ is never used. Like, at all. Ever. In French, the letter ‘e’ has a slightly higher frequency than it does in English, so this was no small feat. While it can be maddening at times, as the Oulipo realized, imposing an arbitrary constraint upon yourself can force you to focus more clearly on what you are doing, and can lead to brilliant insights you might never have realized otherwise, as well as a deeper understanding of the boundaries of the system you’re working in.