Tuesday, October 15, 2013

Traffic flow and merging during congestion

A little while ago there was this article about fluid dynamics applied to traffic. Then a couple weeks ago I saw this article pop up on Hacker News talking about the logical errors with the first article. I feel like we still don’t quite have the complete picture here. Both are excellent articles and both speak the truth to some degree or another. I just have a couple points to make. Please note that I’m assuming you have at least skimmed each of those articles.

If you just want to know what I’m getting at here then just jump to the conclusion.

Traffic Flow

The first article proposes that to prevent traffic jams during times of congestion we need to decrease our rate of travel well enough in advance that we can maintain a constant speed and as we finally arrive at the tail end of a “traffic wave” it is beginning to move again. In so doing we can effectively “eat” a traffic wave, avoid stop-and-go traffic and fix traffic jams. This is true. Sort of.

After reading that article I tried it during my regular commute to and from work and I managed to “eat” several traffic waves and maintain a constant speed without having to stop like the cars in front of me.

I can’t speak to what happened to the cars behind me, but the second article makes the argument that by following that proposal you are actually making things worse for the cars behind you. It asserts that, given an average two-second following distance, “there is a limit to the number of cars that can pass by a given point on the highway in a given amount of time, and that limit is one car every 2 seconds, per lane”. Therefore, you can do the math and figure out exactly how many cars can actually fit past the given point. Nothing you can do will change that and therefore you cannot fix a traffic jam. This is also true. Sort of.

In reality, it is very dependent on traffic density. If you are in a bumper-to-bumper traffic jam where all the cars are close together and trying to squish past a certain point, then the second article is correct.

If, however, traffic is less dense, then the first proposal can actually make a difference. Why? Well, you have to account for the time it takes for a car to accelerate. If you just continue riding the bumper of the car in front of you and come to a stop with every traffic wave, then you are maintaining the energy of the wave because you now have to stop and then accelerate which takes extra time. If you back off a bit and maintain a constant speed which allows a great enough distance from the car in front of you then you “eat” the traffic wave and absorb the energy of it. You no longer have to take the extra time to accelerate, and neither do the cars behind you. You also save on gas. The trick, of course, is to do this without causing a traffic wave behind you. Don’t try this in super dense traffic! You will only make things worse.

Traffic Merging

With merging there is yet more to consider. In the comments on the second article there is a comment from a man talking about how, when there is a sign indicating that traffic is merging ahead, he will get over quickly to be courteous or “pro-social”. This is actually a bad idea for a couple reasons.

So, let’s assume that there is construction on a two-lane road two miles ahead and one lane is merging. If most people start merging right away then there is a lot of unused space in the now-empty lane that everyone is merging from. As the second article points out, this only serves to lengthen the distance of the traffic jam. It effectively doubles its length and increases the chance that the jam will “spill out” and affect traffic elsewhere needlessly.

The other problem of course, is that those anti-social, selfish “jerks” (like me) that zoom past in the now-empty lane and merge as late as possible saving themselves from the delay, end up causing even greater delays for those who were courteous. There’s now yet another car that the “pro-social” and “courteous” people have to wait for before they finally get their turn to get through the bottleneck.

The problem is that no matter where you merge, you still take up space and make the jam worse. Whether you merge as late as possible or you merge early, you are lengthening the commute of all the people behind you.

The only thing you accomplish by being “pro-social” is doing yourself and those in the lane you merge into a disservice. YOU are a delay. By merging early you create an uneven distribution of the delay simply moving more of the delay from the lane you are in to the lane you are moving to and decreasing the delay in the lane you are coming from. This gives those in the merging lane an advantage which comes at a cost to the people you are trying to be pro-social or courteous to.

If everyone waited to merge until they had to then everyone would have a similar delay because there would be an even distribution of the delay. The way to be courteous and considerate of everyone is to wait until the last minute to merge if you are merging, or to let one car in front of you at the merging point if you are in the lane being merged in to. This creates the most even distribution.

Conclusion

When you have to merge in dense traffic, please, please do everyone a favor and merge as late as possible. It’s the most courteous and pro-social thing you can do.

Monday, July 8, 2013

Stop telling me to go to college!

UPDATE 2/21/2014: Ryan Higa made an excellent parody that more or less makes my point.

UPDATE 2/23/2013: This article was published today that covers Google’s perspective. It summarizes my point very well:

Your degree is not a proxy for your ability to do any job. The world only cares about — and pays off on — what you can do with what you know (and it doesn’t care how you learned it).

The orignal article:
I don’t have a degree. I started into some college while working full time as a software developer. I was killing myself trying to go to school and work. I didn’t get too far before I started asking myself questions like “What am I doing? Why am I killing myself to get a degree so I can hopefully get a job like what I already have?” It’s not like I would get paid better or have a better position. Despite this, time and time again I’ve had people tell me about how important it is to go to college and get a degree. That said, I think I’ve had more people agree with me overall.

I recently had the college debate again with someone who just graduated in Computer Science. Initially the argument was that you can’t get as good of a job or get paid as well without a degree. This is the typical rhetoric I hear from people who have spent too much money and time being fed this propaganda by the universities and the public school system. It doesn’t add up - at least not in the IT industry and it seems more and more questionable in other fields as well with tuition costs rising and the economy sinking. Programmers are supposed to have a good grasp on logic and reasoning. I’ve debated the necessity of going to college countless times with people but it really bothers me that a Computer Science graduate would so readily ignore reason.

I think the problem we have is that so many people equate college with education. Now, don’t get me wrong. I’m a HUGE fan of education. When I have free time I often spend it learning new things from reading Hacker News. I’ve been guilty more than a few times of getting stuck hour after hour clicking through links on Wikipedia - it’s a nerd trap! Given the choice between reading a novel or a text book, I’d honestly choose the text book. But college isn’t the only way to get an education and not everyone who goes to college actually gets a great education.

Here’s a little background on myself. I started learning HTML and CSS just before I turned 10. I made lots of crappy websites that (thankfully) never saw the light of day, as well as a couple that did. I’ve used one Linux distro or another for my main desktop OS since I was 12. When I was 14 I spent 3 weeks doing a stage 1 install of Gentoo and it was an incredible learning experience. I ran Gentoo for many years on many computers after that. I used to hang out on IRC on #gentoo. I remember once, when I was still 14, helping someone with some JavaScript he was doing for work. I wasn’t even old enough to have a job yet. By the time I was 16 I had gotten my CompTia Linux+ certification, I had jumped into PHP and C++, I was going to an applied technology center where I covered C# and Java, design patterns and algorithms, etc… By 18 I had spent a lot of time in Python and I had won several programming competitions.

I’m not saying this to brag or be arrogant or whatever, but just to illustrate my point. Despite whatever time I spent in school, I spent far more time at home developing my skills. I was the stereotypical socially inept, acne covered, 4-eyed computer nerd living in my parent’s basement.

The guy I was debating with was telling me about how I could only be qualified enough to make good money with a CompSci degree. As we compared notes, so to speak, I discovered that I know a lot more than him, I have a lot more experience than him (personal and professional), and I make a lot more money than him.

I’m glad companies out there recognize the value of knowledge and experience over degrees. When my company hired me they told me that they’ve had a lot of success with people that don’t have degrees - not that they haven’t also had success with people that have degrees, but it’s not required. I recently interviewed with a company that told me that my interview was the best they’ve had in 3 years (which, actually worried me a bit about the type of people I’d have to work with). They offered me six figures, even without a degree (crazy huh? *sarcasm*). It wasn’t quite a good enough offer to leave the awesome company I work for now, but it illustrates the point I’m making.

I haven’t been out looking for another job, but I get contacted quite frequently. In the past year or so I’ve talked to a lot of companies and only one seemed to care much about a degree, despite most of them listing a degree as a requirement in their job listings. Quite frankly though, I probably wouldn’t accept a job with a company that insisted on a degree, even if I had one. That tells me that they’re more concerned about how you look on paper than how you actually perform. They would turn away good talent over a meaningless qualification. I fear that would mean working with less-intelligent people.

Anyway, let’s get back to the story. This guy ultimately resorted to something like, “I just think college is important. It’s just the way I was raised.” No logic, just insistence. I feel like this is a major problem with the mentality of society today. This is how all the universities out there want you to think and this is how they want you to raise your children. They’ve done a pretty good job of convincing most of American society that you get paid more with a degree and that without a degree you can’t be successful. It feeds the beast and the beast wants to be fed. Anyone who dares to question the notion is a threat to their business.

I recognize there are fields where a degree or license is required. I also recognize that there are people who lack ambition or need the structure or whatever else of a university. More power to ya! I don’t care if anybody wants to go to college, but please, PLEASE stop telling me that it’s the only way to be successful, or that people that go are somehow superior, or that I just need to go. Stop telling me to give up a great job and go rack up lots of debt to effectively learn very little more than I already know or can figure out myself, all just to get a job like the one I’ve already got.

If you are passionate and ambitious, then try Google and Wikipedia and Stack Overflow - the free online universities! There really is very little you can’t find online already. And if you can’t, then ask a question somewhere. The internet is filled with people who know things about stuff!

The Point

If you want to go to college that’s fine; but make sure you know why you’re doing it. If the only reason you have is “because that’s how I was raised/taught” then maybe you should rethink your decision.

Friday, June 7, 2013

Why you should reinvent the wheel

This applies to pretty much anything but I’m going to apply it specifically to programming since that’s my field. Have you ever heard someone ask, “Why reinvent the wheel?” as they begin to instruct a new developer on why they shouldn’t undertake a new project that’s been done a thousand times before by other people? That’s always bothered me. When I was learning to program I used to “reinvent the wheel” all the time! Honestly, I hated just using someone else’s code without having taken a crack at it myself.

I always thought I could do it better somehow. I thought I could make it more efficient, add some new feature or somehow simplify it. If nothing else, at least I would know how it worked and could fix bugs myself. I was stubborn for a while after jQuery came about and wouldn’t use it because I always wanted to write my own library to do something like that. I hated using code that wasn’t mine. None of my code from those years ever saw the light of day. I’ve still got some of it sitting around though. It’s interesting to take a look back at it sometimes and see what I was thinking or see how I did certain things. Frankly, I would probably be embarrassed to show the code to anyone now. The libraries that are out there have been done so much better.

If you’re not particularly passionate about programming it’s tempting to just take someone’s black box library and just use their API and assume it runs on magic more powerful than your own! It’s not like you NEED to understand how it works to use it, right? It’s less work you have to do and it gets the job done.

Problem is, if that’s your mentality you won’t go far. I always insisted on seeing inside the black box and figuring out how it works, then re-creating it. Of course, when I would re-create it I would always try to improve it. Heck, back when I was 10 years old trying to learn HTML I was constantly viewing the source on websites across the internet because I had to figure out how they did what they did. Then I would go and try to do it myself. “Reinventing the wheel” was absolutely critical for me in those early years because that’s how I learned to program.

And, that’s how innovation is done every day! You can’t improve something if you don’t even understand how it works. Once you do, you create it again, but better!

For me, it’s paid off. I’m 24 with an awesome job working on a pretty advanced web application with cutting edge technologies and pioneering new solutions in the web application arena. I’m lucky to work for a company that lets me do this. A great example is what’s happening at work right now. We originally tried using SlickGrid in our application and had to pour over the source code and write lots of little hacks to make it do what we needed. Unfortunately, it just can’t handle what we’re throwing at it. Now, we’re reinventing it - but waaay better this time!

We’re actually about to open source the realtime JavaScript framework we’ve been building on top of Backbone. It includes many examples of innovation and lessons learned from other frameworks and libraries we’ve studied or worked with in the process of developing our product. We could have just gone with what was out there at the time we started (which wasn’t much, to be honest1), but that would have defined and limited what we could achieve. Instead, we’ve now created a system unlike any out there to use in the newsroom at KSL and Deseret News. I’m excited to launch the framework we built it on publicly. Sorry about the little tangent there :)

We need people that challenge the assumptions made and the patterns used in the things we take for granted in our every day lives. That goes for any industry. How will we ever have a better wheel if no one tries to reinvent it, or worse, if no one even understands how it works? Who says a wheel is even the best tool for the job?

So why reinvent the wheel?

To learn
To make a better wheel
Because the wheel you have doesn’t solve your needs or wants

The End.

At the time we started our framework Backbone was all the rage. AngularJS and Ember.js were just starting and had no community. SproutCore was on it’s way out. ExtJS had been around for a while but that was slow, had bugs and didn’t allow us the flexibility we wanted - it’s a monolithic framework. While our system bears some interesting conceptual similarities in a few parts with Meteor, it didn’t even exist at the time. ↩

Tuesday, April 9, 2013

Exceptions and resource leaks in Javascript and Node.js

A while ago I read a comment in the Node.js documentation on uncaughtException mentioning that it may be removed (no longer true), that it is a crude mechanism for exception handling and shouldn’t be used and that domains should be used instead. So I made a mental note and figured I’d look into domains when I got the chance. This past week I started looking into it a bit and noticed some more concerning comments in the documentation for domains:

By the very nature of how throw works in JavaScript, there is almost never any way to safely “pick up where you left off”, without leaking references, or creating some other sort of undefined brittle state.

In the comment in the code block a little below that, where it’s demonstrating a bad way to use domains as a global catch-all like uncaughtException it states:

The error won’t crash the process, but what it does is worse! Though we’ve prevented abrupt process restarting, we are leaking resources like crazy if this ever happens.

They recommend that when you catch exceptions in a global way like this you always shut down and restart the process. My first reaction was something like “Woah! Hold on there Skippy!” I’ve written a server that handles a lot of persistent websocket connections and I can’t just shutdown because I didn’t handle some exception thrown while handling a misbehaving client. And what the heck is it talking about with this whole “leaking resources like crazy” thing anyway?

So naturally I did some Googling to see if I could figure out what it’s talking about. Unfortunately I couldn’t find anything useful. I posted a question on Stack Overflow which was helpful though. While the answers there do seem kind of obvious in retrospect they weren’t obvious to me at the time because the resources (open files and sockets) mentioned in the answers don’t apply to the server I’ve written (another reason why it’s bad to blindly tell everyone they’re leaking resources).

My purpose in writing this post is mostly to help answer the question I had about what resources might be leaking and how to write code taking this into account. The Node.js documentation is misleading and non-descriptive. The only time it’s definitely bad to continue after an exception is if you literally have no idea where it was thrown and what resources might have been in use (e.g. global exception handlers).

When an exception is thrown while you have an open stream (file, socket, etc…) the code that follows which normally should finish with the resource and close it, never runs. That means it’s left open and you are now leaking resources. When you throw an exception in a block of code where resources are open you are introducing leaks.

The only way to make sure you aren’t leaking resources is on a case-by-case basis… sort of.

The problem with exceptions can be handled pretty well with domains. However, you run into the exact same problem if you simply forget to close a resource or return too early, while it’s still open.

Some Solutions

Ideally you shouldn’t have unhandled exceptions. We don’t live in an ideal world, but it’s still good to try to handle as many as you can. If you catch the exception close enough to where it occurred that you know what resources are in use and can clean them up properly then it’s alright to continue. If you write code that throws exceptions, keep in mind what resources are open at the time you throw the exception and try to clean them up before throwing.

You can use domains in Node.js to try to have some context and specificity to your error handling. For example, if you were writing an HTTP server, you could use domains to handle the case where an exception is thrown while processing the request (even if it’s doing asynchronous stuff) and then at least shut down the open socket from the client and clean up the request a bit. Otherwise, since an exception was thrown, no response would be sent and the request would be left open until it timed out.

If you are writing a server or process that can fork or spawn workers (such as a request handler for a web server or something else) then the problem can be largely mitigated by simply shutting down the worker (hopefully gracefully) and spawning new workers as needed.

An Ideal Solution

Though it doesn’t exist in JavaScript, an ideal solution would be something like Python’s context managers. They are a powerful tool for managing resources. As mentioned here:

The advantage of using a with statement is that it is guaranteed to close the file no matter how the nested block exits. If an exception occurs before the end of the block, it will close the file before the exception is caught by an outer exception handler. If the nested block were to contain a return statement, or a continue or break statement, the with statement would automatically close the file in those cases, too.

Using the with statement in Python (for those unfamiliar) looks like this:

with open('/path/to/file', 'w') as f:
    f.write('some text')

That’s it! There is no need to bother closing the file. Anything that needs to be done with the file is done within the with block. When it exits the scope the context manager closes the file. And the best part is that you can write your own context manager, just like open() in this example! You could write one for handling sockets, including sockets that are already open.

Though there is nothing like this in JavaScript I think it’d definitely be a nice addition. Even if it were never to become a part of JavaScript itself, it would be a particularly nice addition to Node.js. Unlike JavaScript in the web browser, JavaScript on the server handles a lot more resources like this (open files and sockets) so it is much more important to be able to handle resources in an elegant and clean way without having to shutdown every time an unhandled exception occurs.

That said, I don’t know exactly how they would implement something like context managers with the asynchronicity of JavaScript. I suppose when there are no more references to a resource it’s essentially “out of scope” and the exit on a context manager could be called to close the resource. In that case it would have to run as part of the garbage collection process and you really cannot know when it will run which could produce other unpredictable and/or undesirable behavior.

Conclusion

Keep in mind that this post only really addresses concerns relating to leaking resources, not the brittle state of the application after an unhandled exception that is being caught for the first time globally, and thus without context. It is difficult then to be able to know the state of the application. Whatever function threw the exception (which you don’t know since you have no context) could have modified any shared state variables and not finished its work, leaving it in an unknown state. Global exception handlers really should only be used for logging uncaught exceptions so you can fix them and to shutdown gracefully. Though it’s not impossible, especially in smaller systems, to be able to verify that the system is in a sane state from a global exception handler.

For an interesting and brief explanation of things to consider with exception handling, check out this explanation from an answer to my SO question. Trust me, you want to click that link!

Friday, February 22, 2013

Make your media keys work with Spotify in Linux

This will be a quicky. I just wanted to document how I made my media keys work for Spotify in Linux. I’m going to give instructions for Ubuntu, but it shouldn’t be hard to adapt to any Linux distro.

First install xbindkeys:

apt-get install xbindkeys

Next put this in a file named .xbindkeysrc in your home directory (e.g. vim ~/.xbindkeysrc)

"dbus-send --print-reply --dest=org.mpris.MediaPlayer2.spotify /org/mpris/MediaPlayer2 org.mpris.MediaPlayer2.Player.PlayPause" XF86AudioPlay
"dbus-send --print-reply --dest=org.mpris.MediaPlayer2.spotify /org/mpris/MediaPlayer2 org.mpris.MediaPlayer2.Player.Stop" XF86AudioStop
"dbus-send --print-reply --dest=org.mpris.MediaPlayer2.spotify /org/mpris/MediaPlayer2 org.mpris.MediaPlayer2.Player.Previous" XF86AudioPrev
"dbus-send --print-reply --dest=org.mpris.MediaPlayer2.spotify /org/mpris/MediaPlayer2 org.mpris.MediaPlayer2.Player.Next" XF86AudioNext

To test open a command line and run

xbindkeys &

Finally make sure xbindkeys is started when you login by opening your “Startup Applications” settings and adding the xbindkeys command. You can do this from the command line by creating a the file ‘~/.config/autostart/xbindkeys.desktop with the following contents:

[Desktop Entry]
Type=Application
Exec=xbindkeys
Hidden=false
NoDisplay=false
X-GNOME-Autostart-enabled=true
Name[en_US]=xbindkeys
Name=xbindkeys
Comment[en_US]=
Comment=

If it’s not working for you it’s probably because xbindkeys isn’t picking up the keys. To test this run: xbindkeys -k

Then press the pause/play button. If xbindkeys sits there and doesn’t recognize it then this is your problem. To fix this simply go to your keyboard shortcut preferences (System Settings -> Keyboard -> Shortcuts -> Sound and Media) and change the accelerators for “Play (or play/pause)”, “Stop playback”, “Previous track” and “Next track” to something different. I changed them to Ctrl+[pause/play button], Ctrl+[stop button], Ctrl+[previous] and Ctrl+[next] respectively. After you’ve done this it will allow xbindkeys to capture the shortcut. Keep in mind that if xbindkeys is already running in the background then xbindkeys -k will still not capture the keys but they will work.

Tuesday, February 19, 2013

Fast Mersenne Twister in PHP

I was recently trying to write a class in PHP that I could use to encrypt and decrypt data for communication between our server and remote clients. As I got started I realized I was going to need to generate a lot of pseudo-random numbers. However, I needed to be able to reproduce the same numbers given the same seeds on a different system. The Mersenne Twister algorithm is a well-known, solid random number generator - and apparently faster than PHP’s rand() function. I realized PHP has an implementation of the Mersenne Twister algorithm call mt_rand(), so I thought that would make life easy. It turns out that as of PHP 5.2.1 the same seed no longer produces the same values as stated in the documentation for mt_srand():

The Mersenne Twister implementation in PHP now uses a new seeding algorithm by Richard Wagner. Identical seeds no longer produce the same sequence of values they did in previous versions. This behavior is not expected to change again, but it is considered unsafe to rely upon it nonetheless.

So I decided I would go ahead and implement the algorithm myself. At first, I was just looking to see if I could find anyone else’s PHP implementation of the algorithm, but I had no luck with Google. After that I went to Wikipedia and worked from the pseudocode, which you should check out if you really want to understand the algorithm. That much is pretty straight forward. However, my purpose in writing this article is two-fold:

I want to make a PHP implementation of the Mersenne Twister publicly available for other seekers
I want to discuss some of my modifications to enhance the speed

Here’s my implementation:

/**
 * Mersenne Twister Random Number Generator
 * Returns a random number. Depending on the application, you likely don't have to reseed every time as
 * you can simply select a different index from the last seed and get a different number - it is already
 * auto-incremented each time mt() is called unless the index is specified manually.
 *
 * Note: This has been tweaked for performance. It still generates the same numbers as the original
 *       algorithm but it's slightly more complicated because it maintains both speed and elegance.
 *       Though not a crucial difference under normal use, on 1 million iterations, re-seeding each time,
 *       it will save 5 minutes of time from the orginal algorithm - at least on my system.
 *
 * $seed  : Any number, used to seed the generator. Default is time().
 * $min   : The minimum number to return
 * $max   : The maximum number to return
 *
 * Returns the random number as an INT
 **/
function mtr($seed = null, $min = 0, $max = 1000)
{
  static $mt = array(); // 624 element array used to get random numbers
  static $ps = null; // Previous Seed
  static $idx = 0; // The index to use when selecting a number to randomize


  // Seed if none was given
  if($seed === null && !$ps) {
    $seed = time();
  }


  // Regenerate when reseeding or seeding initially
  if($seed !== null && $seed !== $ps) {
    $mt = array(&$seed, 624 => &$seed);
    $ps = $seed;


  for($i = 1; $i < 624; ++$i) {
      $mt[$i] = (0x6c078965 * ($mt[$i - 1] ^ ($mt[$i - 1] >> 30)) + $i) & 0xffffffff;
    }
  }


  // Every 624 numbers we regenerate
  if($idx == 0) {
    // This has been tweaked for maximum speed and elegance
    // Explanation of possibly confusing variables:
    //   $p = previous index
    //   $sp = split parts of array - the numbers at which to stop and continue on
    //   $n = number to iterate to - we loop up to 227 adding 397 after which we finish looping up to 624
    //        subtracting 227 to continue getting out 397 indices ahead reference
    //   $m = 397 or -227 to add to $i to keep our 397 index difference
    //   $i = the previous element in $sp, our starting index in this iteration
    for($j = 1, $sp = array(0, 227, 397); $j < count($sp); ++$j) {
      for($p = $j - 1, $i = $sp[$p], $m = ((624 - $sp[$j]) * ($p ? -1 : 1)), $n = ($sp[$j] + $sp[$p]); $i < $n; ++$i) {
        $y = ($mt[$i] & 0x80000000) + ($mt[$i + 1] & 0x7fffffff);
        $mt[$i] = $mt[$i + $m] ^ (($y & 0x1) * 0x9908b0df);
      }
    }
  }


  // Select a number from the array and randomize it
  $y = $mt[$idx];
  $y ^= $y >> 11;
  $y ^= ($y << 7) & 0x9d2c5680;
  $y ^= ($y << 15) & 0xefc60000;
  $y ^= $y >> 18;


  // Increment the index so we will be able to get a different random number without changing the seed
  $idx++;
  if($idx == 624) $idx = 0;


 return $y % ($max - $min + 1) + $min;
}

I’m going to focus on lines 31 and 52 in the code above (which are no longer highlighted, so have fun finding them!). First you’ll notice the array called $mt. Before filling this array, two values are assigned. In the original algorithm, only the first value is assigned to the last 32 bits of the seed. But this creates a slow down because then in the second for loop where you see $mt[$i + 1] this reads $mt[($i + 1) % 624] in the original. Division is the slowest math operation on a computer. Since modulus returns the remainder of a division operation, it obviously still has to do the division operation. Thus, this becomes a big slowdown when doing a lot of iterations. This line just looks painful to me because it does the modulus 624 times and it only has an effect on the very last iteration. So, I set out to find a better solution. As I was laying in bed pondering, the thought struck me that all we really need to do is create a 625 element array, but set the last element to the actual reference of the first (since the value of the first element changes on the first iteration). Then, there’s no longer a need for a modulus. The last element of the array is the first element, and we still stop at 624, so there are no boundary problems. Well, that eliminates one modulus, but there’s still a couple more in the pseudocode. The next line of code, before modification, looks like this:

$mt[$i] = $mt[($i + 397) % 624] ^ ($y >> 1) ^ $op[$y & 0x1];

There are several possible ways to eliminate the second, but after hours of playing around with it, I couldn’t seem to find anything that sill looked clean and elegant. It seemed that it would require adding more lines of code and using if statements and too many extra variables. The idea that finally worked came to me, once again, while I was laying in bed pondering about it. Somehow my best ideas always come when I’m trying to sleep.

To solve this, we first have to notice that the algorithm is always trying to access the index that is 397 ahead of the current one. So, if we want to eliminate the modulus, then as soon as we hit index 227 we have to somehow jump to element 0, and count up from there. The only reasonably elegant way of doing this without the modulus that I could find, was to stick the exact indices at which we need to switch in an array, and loop through those. So there is a 3 element array with the values 0, 227, and 397; however, we only loop through the last 2. The first one is used simply so the inner for loop knows what index to begin counting up from.

Now, since I can’t really be bothered explaining much more today, I’ll simply point out that because of the * ($p ? -1 : 1) trick, as soon as it reaches 227, it loops through the outer for loop again, jumping to 397; however, this time it’s subtracting 227. By subtracting 227 it creates a 397 index difference. In other words, (620 + 397) % 624 is the same as 620 - 227. Modulus #2 eliminated.

Probably the most painful part of the algorithm if implemented straight from the pseudocode, is the modulus in the if statement. It goes inside the for loop and looks something like this:

// If $y is odd
if($y % 2 == 1) {
  $y ^= 0x9908b0df;
}

Any good developer knows there’s a better way to tell if a number is odd. In binary all odd numbers end with a 1 which means all you have to do to test if the number is odd is:

if($y & 1) {
  $y ^= 0x9908b0df;
}

This is a virtually instantaneous bitwise operation, versus the countless CPU cycles a division operation might take. But, we can still do better than this in this particular case. We can eliminate the if statement (thus avoiding CPU branching). Basically all we want to do is XOR all odd numbers with 0x9908b0df. Instead of doing the if, let’s just always do this:

$mt[$i] = $mt[$i + $m] ^ ($y >> 1) ^ (($y & 0x1) * 0x9908b0df);

Because the value of $y & 1 will always be either 1 or 0, we can multiply that by our magic number here to determine whether it will modify the current value. The XOR will always run but it will only have an effect on odd numbers. This is still far better since an XOR takes only one cycle where the division and branching would have been much more intensive.

The last modulus is just replaced with an if statement which should be easily predicted with the CPU’s branch predictor (see this and look at this if you still don’t get it), allowing it to be much faster.

That’s it! We’ve eliminated all 4 modulus operators and the if statement. When doing the number of iterations, re-seeding each time, required for my encryption algorithm, it makes a big difference in speed. Hopefully somebody finds this useful. If you find any problems with the code or want to suggest improvements, please let me know.

Please keep in mind that all this math results in numbers far too large for a 32-bit integer and will thus result in different values with the same seed between a 32 and 64-bit processor. To avoid problems (ensure consistent results) this should either be written using a big math library, such as GMP or BCMath, or all numbers need to be truncated (i.e. $num & 0xFFFFFFFF) after each multiplication or left bit shift operation in the algorithm. Another consideration in PHP is that it does not support unsigned integers, which will also throw off the results if not using a big math library (PHP copies the sign bit when doing a right bit shift). I have written my own math library for PHP which can do all math and bitwise operations on normal PHP integers, treating them as unsigned.

Thursday, January 24, 2013

Closures in loops - Javascript Gotchas

When you do enough Javascript programming, it’s difficult to avoid learning about this sooner or later, but I still feel like it’s not well known enough. If you learn about it from a blog post like this, you’re one of the lucky ones. Once upon a time I spent hours going crazy thinking the world didn’t make sense anymore - that or every Javascript implementation had a serious bug. I was trying to figure out why it was that a closure within a for loop seemed to have the same value for every iteration of the loop - it just wasn’t changing even though the loop was iterating properly. I was baffled. Isn’t a closure supposed to capture the current value of variables within it’s accessible scope?

Here’s a simple piece of code you can run to see what I’m talking about:

for(var i = 0; i < 3; i++) {
    setTimeout(function() {
        console.log(i);
    }, 0);
}

If you run that in a Javascript console, it’ll output something like this:

3
3
3

Wait! What? Ok, so maybe I gave it away by wrapping it in a call to setTimeout. What’s happening here? It turns out JavaScript is single threaded so it keeps an event queue where it queues up things to do. The closure created in each loop iteration is queued to run as soon as the rest of the current execution context finishes and CPU time is returned to the event loop. setTimeout here serves to simply defer the execution of each closure until after the loop finishes running. By that point the final value of i is ‘3’. Keep in mind that i was declared before the loop started. It’s scope is external to the loop.

Ok, that makes sense, but what if I do something like this:

for(var i = 0; i < 3; i++) {
    var j = i;
    setTimeout(function() {
        console.log(j);
    });
}

Now we’re declaring a variable j inside the loop that is a copy of i. Well, it turns out that the scope of j here is also external to the loop. It will continue to exist outside the loop and thus only ever has one value which changes as the loop iterates. This in turn means that when a deferred closure runs, the latest value will be the one that all the closures from this loop end up with. In Javascript it is very common practice to pass callbacks around that are deferred until something completes or some event occurs. So like I said, sooner or later you’re likely to run into this problem. It’s gotten me more than once, even though I knew about it.

So what can I do then to make my closures work right in a loop?

I’m glad you asked! There are a few different ways you can handle it, but the basic concept is that you have to capture the value of i in each iteration of the loop. By far the cleanest solution is to just use an iterator function like _.each or $.each from the Underscore or jQuery library:

_.each(_.range(3), function(i) {
    setTimeout(function() {
        console.log(i);
    });
});

If you already have a loop that you don’t want to convert to use an iterator function, all you have to do is wrap your closure in a closure in which you define new variables which capture the current value of the variables that change on each iteration. Got that? The trick to capturing the variables is making sure your outer closure executes immediately during the current iteration of the loop. You can use one of these two similar approaches:

Method 1:

for(var i = 0; i < 3; i++) {
    (function() {
        var num = i;
        setTimeout(function() {
            console.log(num);
        });
    })();
}

Method 2:

for(var i = 0; i < 3; i++) {
    (function(i) {
        setTimeout(function() {
            console.log(i);
        });
    })(i);
}

And that’s it! The world makes sense again! Happy coding :)

UPDATE - 10/7/2013

There is now another simpler solution to this problem since the let keyword is now supported in both Firefox and Chrome. The let keyword is used in place of var to scope variables to the block it’s used in. Similar to one of the above examples that does not work, you can swap var for let and it works magically, like so:

for(var i = 0; i < 3; i++) {
    let j = i;
    setTimeout(function() {
        console.log(j);
    });
}

This works because we declare a new variable, j, which we set to the value of i which is captured by the closure within the loop, but it doesn’t continue to exist beyond the end of one iteration of the loop since it’s scoped locally.

Tuesday, January 15, 2013

Password Insanity

I’m going to pick on password security requirements for a bit. I was just forced to change my password for a certain website which requires their users to change it every 90 days. This is a common requirement for a lot of places, but also, potentially a stupid one. Let’s look at this for a minute.

Being a programmer, I consider myself an expert user. I think about security all the time. I have several different randomly generated passwords containing numbers, letters, and in some cases symbols. I have to have a variety because of the various ridiculous password requirements of different websites and services. We’ll get to that in a minute. So I use one of these passwords for a website I signup at. Later I login to discover that I’m now being forced to change my password. It’s not long before I run out of my more secure passwords. At this point I’m frustrated and I resort to far less secure password practices.

I could generate a new password or make one up each time I have to change it and write it down or store it in a password vault. If I weren’t an expert user, I’m more likely to write it down, which is far less secure. Not only is this an inconvenience because I have to lookup my password every time I login if it’s not a place I login to often, but it also means that my password is physically stored somewhere. This will always be less secure than if the only place it was stored was in my head.

What I usually end up doing is using far less secure passwords where I can just increment a number or reverse the password a bit. For example, on one such site I used abcd1234 then when it expired I reversed it to 1234abcd. I also stored it in a file so I didn’t have to go through the whole process of resetting the password when I inevitably forget what I used for that site. I used to have a nice quality password there that was only stored in my head.

Now let’s take a practical look at password restrictions. On this same website they had a list of requirements when setting a password. A lot of administrators implement strict requirements feeling like this leads to better security. Let’s have a look at some of these requirements and what they say to a hacker:

Password must be EXACTLY 8 characters

Really? Ok, so now if I’m a hacker and I’m configuring my password cracking software to brute force a user’s password I now know that I don’t have to bother trying anything less than OR greater than 8 characters. This narrows it down to a VERY small number of passwords that a computer has to try to guess the right password.
Password must contain at least 1 letter and 1 number

Ok, so now I know that I can probably find a large chunk of user’s passwords with a simple dictionary attack with mutations. This means that the program will try adding mutations such as adding common numbers to the beginning and end of a password and adjusting the casing of letters. I don’t need to bother with a plain dictionary attack. And believe it or not, having a number in your password doesn’t magically make it secure, and you can have a very secure password with only letters.
Password must NOT contain any symbols

This is getting ridiculous. The number of passwords that I have to try to successfully guess a user’s password has just been reduced exponentially. If the software/website literally can’t support symbols then it was terribly written by terrible programmers. There is no excuse for this.

These are just a few of the actual requirements that I see still in place around the web. And guess what? If I can get an MD5/SHA-whatever hash of your password, then I can just run it against rainbow tables that are widely available for download and have any user’s password in seconds. Unless, of course, the hash is salted - but we won’t get into that now.

So, you may think you’re doing your user’s a favor by enforcing strict password requirements, but you’re not. I would simply require a minimum of 8 characters and leave it at that. Want to know how to generate an very secure, yet easy to remember password? Check it out! http://xkcd.com/936/

Also, as a side note, if you ever find a website that emails you your password in plain text, I would panic! This means they have it in plain text in their database as well. Never use a password you care about with one of these websites. And make sure you send them many angry emails!