Share it!

Although you might not expect it, random numbers are used in databases quite often, especially when generating dummy data, providing randomized lists of products, to name just a couple.

For instance, let’s say you have a table named “productData” and inside this table there’s a field named “randCol”. Then, if you want to update each row and place a value in the 1-100 range in “randCol”, you could do something like this:

UPDATE `productData` SET `randCol` = FLOOR( 1 + RAND( ) *100 );

Wait! Why not simply use FLOOR( RAND( ) *100 )? Well, according to the MySQL documentation for RAND(), this function always returns a random floating-point value v in the range 0 <= v < 1.0, so it won’t ever return 1, leaving the high end of the range out. RAND would also return a 0 at times, which in turn would return a zero in the product calculation, an invalid value in our desired range.

With that in mind, the largest value which could be generated by RAND() * 100 would be 99.999999, which when floored would give 99. On the other side of the range, the smallest value would occur when RAND() returns 0. When adding 1 to these values we have both ends of the range are covered and no numbers outside it will ever be generated.

Generating UNIQUE random numbers in a table

The previous query is useful when you want to fill a column with random numbers within a range. But this query does not guarantee unique values for each row, even with a very large range. So, if we want unique values in each row while randomness, we could go for something like this.

SET @rnd := 0;
UPDATE `productData` SET `randCol`= (@rnd := @rnd + 1)
ORDER BY RAND();

Here’s how it works. We first set the user-defined variable @rnd to 0, then we update all rows while sorting using RAND() as our sorting variable. So, for each updated row, we assign it the (@rnd + 1) value while storing this value in the same variable. This way, the next row will get this value + 1 and since we’re sorting randomly, each row will end up having a random AND unique value.  The number set would be sequential but each row get a random number within that set, without duplicates.

Some caution: Since this query uses a User Defined Variable (@rnd) and these variables are connection specific, you can NOT use this MySQL query with a system that uses connection pooling or persistent connections. Using them on these environments would return wrong results, as connections would step over the value of @rnd, messing up the results.

Retrieving rows in random order

Sometimes, it is useful to get a list of data from a database (e.g. offers) and present it to the user in the frontend. However, showing the same list every time can bore the user and force him/her exit your website, losing potential leads/customers.

In these cases, showing a random list of offers can be a better choice, as each time the user visits the site, there would be different ones, thereby increasing conversion potential.

Achieving this is rather simple, as offers can be sorted by using a query like this one:

SELECT * FROM  `productData` ORDER BY RAND();

Easy, huh? It is. However, here comes the “dark side” of using this solution: MySQL will create a temporary table with all results, then will be assigned a random index to each row and finally will return the random rows! Also, generating random numbers imposes a heavy burden on the server, as they’re created using complex math calculations. So this solution is not a good idea if you’re dealing with a table with a lot of rows.

Dealing with pagination

From the example above, we saw that getting rows in a random order is very easy (although, as mentioned before, it can get quite slow if managing a large number of rows).  However, using this query in a pagination context, where we return a chunk of X rows from the table on each query and then get other chunks in successive queries. Like so:

SELECT * FROM `productData` ORDER BY RAND() LIMIT chunkSize OFFSET start;

Why? Because we’re randomly sorting rows, each SELECT query will return a different set of rows, and that means that returned rows will not maintain the same order on each call (randomly ordered, basically) can come up more than once and some rows might never be returned.

So, how do we solve this? MySQL’s RAND() function takes an optional value, called “seed” value. This value is used to set-up the randomizer engine. If no seed is provided, one is generated by MySQL “under the hood”, so each time the function is called, different seed values would be provided.

One implication of using a fixed seed value is that for equal argument values, RAND() produces a repeatable sequence of column values. This means that each time the query is run, rows will be sorted in the same order, no matter how many times we call it. So now we can retrieve each “chunk” of rows, randomly sorted but without fearing repeated or missing rows.

But how do we provide a randomized list, then? There are many options for this. One option is to retrieve a random value from MySQL itself and then use it on each call. Another one is to use the randomizer engine from the language you’re coding with. For instance, if you’re using PHP, you could use rand() or mt_rand() to generate the seed value and use it on the MySQL queries  (Note: as of PHP 7.1.0 both use the same engine, the Mersenne Twister generator, so it’s the same to use one or the other).

This way, each time we request a “page” and pass the same seed value, we can be certain that no rows will be left out and no duplicate rows will come up.

 


Share it!