Honestly, tests should come in pairs:
One in Rockbox, and one in the original firmware using identical settings. This means you need to have settings in Rockbox you can reproduce in the original firmware, and shuffle should not be used, for example. This way you get to see whether your runtime is better or worse than the original firmware.
Unfortunately tests on their own don't really tell much, since you don't have a baseline for your battery condition. As well, formats like FLAC that decode faster than realtime, so don't require boosting, but involve spinning up the HD a lot, can't really be compared to lossy codecs like MP3, so it's hard to place that on a scale without other FLAC tests.
That being said, anyone is free to upload their results, but it's hard to compare them and know how things are doing in most cases.