Tuesday, February 19, 2013

Fast Mersenne Twister in PHP

I was recently trying to write a class in PHP that I could use to encrypt and decrypt data for communication between our server and remote clients. As I got started I realized I was going to need to generate a lot of pseudo-random numbers. However, I needed to be able to reproduce the same numbers given the same seeds on a different system. The Mersenne Twister algorithm is a well-known, solid random number generator - and apparently faster than PHP’s rand() function. I realized PHP has an implementation of the Mersenne Twister algorithm call mt_rand(), so I thought that would make life easy. It turns out that as of PHP 5.2.1 the same seed no longer produces the same values as stated in the documentation for mt_srand():

The Mersenne Twister implementation in PHP now uses a new seeding algorithm by Richard Wagner. Identical seeds no longer produce the same sequence of values they did in previous versions. This behavior is not expected to change again, but it is considered unsafe to rely upon it nonetheless.

So I decided I would go ahead and implement the algorithm myself. At first, I was just looking to see if I could find anyone else’s PHP implementation of the algorithm, but I had no luck with Google. After that I went to Wikipedia and worked from the pseudocode, which you should check out if you really want to understand the algorithm. That much is pretty straight forward. However, my purpose in writing this article is two-fold:

  1. I want to make a PHP implementation of the Mersenne Twister publicly available for other seekers
  2. I want to discuss some of my modifications to enhance the speed

Here’s my implementation:

/**
 * Mersenne Twister Random Number Generator
 * Returns a random number. Depending on the application, you likely don't have to reseed every time as
 * you can simply select a different index from the last seed and get a different number - it is already
 * auto-incremented each time mt() is called unless the index is specified manually.
 *
 * Note: This has been tweaked for performance. It still generates the same numbers as the original
 *       algorithm but it's slightly more complicated because it maintains both speed and elegance.
 *       Though not a crucial difference under normal use, on 1 million iterations, re-seeding each time,
 *       it will save 5 minutes of time from the orginal algorithm - at least on my system.
 *
 * $seed  : Any number, used to seed the generator. Default is time().
 * $min   : The minimum number to return
 * $max   : The maximum number to return
 *
 * Returns the random number as an INT
 **/
function mtr($seed = null, $min = 0, $max = 1000)
{
  static $mt = array(); // 624 element array used to get random numbers
  static $ps = null; // Previous Seed
  static $idx = 0; // The index to use when selecting a number to randomize


  // Seed if none was given
  if($seed === null && !$ps) {
    $seed = time();
  }


  // Regenerate when reseeding or seeding initially
  if($seed !== null && $seed !== $ps) {
    $mt = array(&$seed, 624 => &$seed);
    $ps = $seed;


  for($i = 1; $i < 624; ++$i) {
      $mt[$i] = (0x6c078965 * ($mt[$i - 1] ^ ($mt[$i - 1] >> 30)) + $i) & 0xffffffff;
    }
  }


  // Every 624 numbers we regenerate
  if($idx == 0) {
    // This has been tweaked for maximum speed and elegance
    // Explanation of possibly confusing variables:
    //   $p = previous index
    //   $sp = split parts of array - the numbers at which to stop and continue on
    //   $n = number to iterate to - we loop up to 227 adding 397 after which we finish looping up to 624
    //        subtracting 227 to continue getting out 397 indices ahead reference
    //   $m = 397 or -227 to add to $i to keep our 397 index difference
    //   $i = the previous element in $sp, our starting index in this iteration
    for($j = 1, $sp = array(0, 227, 397); $j < count($sp); ++$j) {
      for($p = $j - 1, $i = $sp[$p], $m = ((624 - $sp[$j]) * ($p ? -1 : 1)), $n = ($sp[$j] + $sp[$p]); $i < $n; ++$i) {
        $y = ($mt[$i] & 0x80000000) + ($mt[$i + 1] & 0x7fffffff);
        $mt[$i] = $mt[$i + $m] ^ (($y & 0x1) * 0x9908b0df);
      }
    }
  }


  // Select a number from the array and randomize it
  $y = $mt[$idx];
  $y ^= $y >> 11;
  $y ^= ($y << 7) & 0x9d2c5680;
  $y ^= ($y << 15) & 0xefc60000;
  $y ^= $y >> 18;


  // Increment the index so we will be able to get a different random number without changing the seed
  $idx++;
  if($idx == 624) $idx = 0;


 return $y % ($max - $min + 1) + $min;
}

I’m going to focus on lines 31 and 52 in the code above (which are no longer highlighted, so have fun finding them!). First you’ll notice the array called $mt. Before filling this array, two values are assigned. In the original algorithm, only the first value is assigned to the last 32 bits of the seed. But this creates a slow down because then in the second for loop where you see $mt[$i + 1] this reads $mt[($i + 1) % 624] in the original. Division is the slowest math operation on a computer. Since modulus returns the remainder of a division operation, it obviously still has to do the division operation. Thus, this becomes a big slowdown when doing a lot of iterations. This line just looks painful to me because it does the modulus 624 times and it only has an effect on the very last iteration. So, I set out to find a better solution. As I was laying in bed pondering, the thought struck me that all we really need to do is create a 625 element array, but set the last element to the actual reference of the first (since the value of the first element changes on the first iteration). Then, there’s no longer a need for a modulus. The last element of the array is the first element, and we still stop at 624, so there are no boundary problems. Well, that eliminates one modulus, but there’s still a couple more in the pseudocode. The next line of code, before modification, looks like this:

$mt[$i] = $mt[($i + 397) % 624] ^ ($y >> 1) ^ $op[$y & 0x1];

There are several possible ways to eliminate the second, but after hours of playing around with it, I couldn’t seem to find anything that sill looked clean and elegant. It seemed that it would require adding more lines of code and using if statements and too many extra variables. The idea that finally worked came to me, once again, while I was laying in bed pondering about it. Somehow my best ideas always come when I’m trying to sleep.

To solve this, we first have to notice that the algorithm is always trying to access the index that is 397 ahead of the current one. So, if we want to eliminate the modulus, then as soon as we hit index 227 we have to somehow jump to element 0, and count up from there. The only reasonably elegant way of doing this without the modulus that I could find, was to stick the exact indices at which we need to switch in an array, and loop through those. So there is a 3 element array with the values 0, 227, and 397; however, we only loop through the last 2. The first one is used simply so the inner for loop knows what index to begin counting up from.

Now, since I can’t really be bothered explaining much more today, I’ll simply point out that because of the * ($p ? -1 : 1) trick, as soon as it reaches 227, it loops through the outer for loop again, jumping to 397; however, this time it’s subtracting 227. By subtracting 227 it creates a 397 index difference. In other words, (620 + 397) % 624 is the same as 620 - 227. Modulus #2 eliminated.

Probably the most painful part of the algorithm if implemented straight from the pseudocode, is the modulus in the if statement. It goes inside the for loop and looks something like this:

// If $y is odd
if($y % 2 == 1) {
  $y ^= 0x9908b0df;
}

Any good developer knows there’s a better way to tell if a number is odd. In binary all odd numbers end with a 1 which means all you have to do to test if the number is odd is:

if($y & 1) {
  $y ^= 0x9908b0df;
}

This is a virtually instantaneous bitwise operation, versus the countless CPU cycles a division operation might take. But, we can still do better than this in this particular case. We can eliminate the if statement (thus avoiding CPU branching). Basically all we want to do is XOR all odd numbers with 0x9908b0df. Instead of doing the if, let’s just always do this:

$mt[$i] = $mt[$i + $m] ^ ($y >> 1) ^ (($y & 0x1) * 0x9908b0df);

Because the value of $y & 1 will always be either 1 or 0, we can multiply that by our magic number here to determine whether it will modify the current value. The XOR will always run but it will only have an effect on odd numbers. This is still far better since an XOR takes only one cycle where the division and branching would have been much more intensive.

The last modulus is just replaced with an if statement which should be easily predicted with the CPU’s branch predictor (see this and look at this if you still don’t get it), allowing it to be much faster.

That’s it! We’ve eliminated all 4 modulus operators and the if statement. When doing the number of iterations, re-seeding each time, required for my encryption algorithm, it makes a big difference in speed. Hopefully somebody finds this useful. If you find any problems with the code or want to suggest improvements, please let me know.

Please keep in mind that all this math results in numbers far too large for a 32-bit integer and will thus result in different values with the same seed between a 32 and 64-bit processor. To avoid problems (ensure consistent results) this should either be written using a big math library, such as GMP or BCMath, or all numbers need to be truncated (i.e. $num & 0xFFFFFFFF) after each multiplication or left bit shift operation in the algorithm. Another consideration in PHP is that it does not support unsigned integers, which will also throw off the results if not using a big math library (PHP copies the sign bit when doing a right bit shift). I have written my own math library for PHP which can do all math and bitwise operations on normal PHP integers, treating them as unsigned.