Yeah, that sucks.
I'm still doing some reverse-engineering on that, but my gut tells me that an RMS algorithm might work.
The 516 bytes of fingerprint metadata (after base64 decode) is divided into 512 (1-byte) "frequency coefficients" followed by 4 (1-byte) "peak frequencies". Obviously, some investigation is still required, but I've enjoyed using the musicip stuff (formerly musicmagic) for many years and rockbox is my second chance to get a player with this technology (the Entempo Rubato player was the first!).