Support and General Use > Audio Playback, Database and Playlists

Slight bias in playlist randomization

(1/1)

dohashi:
I've done some programming where having uniform random numbers is really important, and so I noticed a small non-uniformity in the way playlists are randomized.  Basically using the mod operator (%) won't generate really uniform numbers.  Currently the code maps a random number in 0..RAND_MAX to 0..count by doing

         /* the rand is from 0 to RAND_MAX, so adjust to our value range */
        candidate = rand() % (count + 1);

To see the non-uniformity, let RAND_MAX be 7 (2^3-1), and let count be 6

rand return                            candidate
0                                            0
1                                            1
2                                            2
3                                            3
4                                            4
5                                            5
6                                            6
7                                            0

Thus candidate will be 0 twice as often as the other possible values.

Of course when RAND_MAX is large relative to count, this effect is much smaller as all the possible values for candidate are mapped to by multiple return values of rand.  The "bad" candidates will only have one additional value mapped too them.

I don't know if you consider this a significant problem or not, but would be willing to submit a patch to fix this.  Basically you want something like this

let n be the smallest power of 2 greater than count+1 (say n = 2^b)

candidate = rand() & (n-1); /* mask off all but the low b bits */
while ( candidate > count ) /* is candidate too big? */
{
    candidate = rand() & (n-1); /* generate another number */
}

Although this may look slow, the expected number of calls to rand (in the worst case) is 2 (more that half the generated numbers will be valid) so you'll find a valid number pretty quickly.

Given that you went to all the trouble of using a mersenne twister to generate the numbers, I though you might want to fix this too.

Darin

gevaerts:
RAND_MAX is 2^31. Assuming you have 30000 tracks (which I think is really a lot), that means you the non-uniformity is about 1 in 2^16. Are you seriously considering this to be a problem?

Also, in your fix, the expected number of calls to rand is indeed 2, but it is unbounded. Do we really want potentially infinite loops in playlist shufling? I agree that this won't happen in practice, but let's face it, the entire problem you try to solve is academic anyway.

dohashi:

I thought I was pretty clear that this is a minor problem, I was more asking if anyone cared.

Rockbox uses a mersenne twister to generate numbers.  That algorithm has a period of  2^19937 − 1, that seems like a bit overkill to randomize those same 30000 songs.  It seemed like someone thought that having really, really random playlists was important. 

I've not done the math, but the 1/2^16 bias is probably larger than the difference between using a mersenne twister and a far simpler algorithm.

Darin

gevaerts:
I fully agree that a mersenne twister is overkill for playlist shufling, but keep in mind that rand() is not only used there. Plugins can use it as well

Multiplex:

--- Quote from: dohashi on August 08, 2009, 01:07:29 PM ---It seemed like someone thought that having really, really random playlists was important. 
--- End quote ---
If I remember correctly that was indeed one of the original drivers for Rockbox...

Navigation

[0] Message Index

Go to full version