Support and General Use > Audio Playback, Database and Playlists
Decoding performance on iPOD
preglow:
--- Quote from: Buschel on May 13, 2007, 11:04:21 AM ---Regarding the use of doubles/float in the synthesis-filter: As far as I can see there are several scaling operations in calc_new_V() which use the macros MPC_SCALE_CONST, MPC_MULTIPLY_FRACT_CONST_FIX and similar ones. These macros are expanded with usage of the MAKE_MPC_SAMPLE_EX macro. This one will use double-conversion and -multiplication.
--- End quote ---
And if you have a look in math.h, you'll see that the floating numbers will be converted to a fixed point number by MAKE_MPC_SAMPLE_EX, MAKE_MPC_SAMPLE and other macros like them. If GCC does not do this conversion at compile-time, it's even more crap than I thought it was. Looking at Calculate_New_V() in a disassembler, I can see no floating point calls, but a lot of mac.l instructions, which indicates that things are working like they should.
--- Quote from: Buschel on May 13, 2007, 11:04:21 AM ---I removed these conversions via defining additional constants and a new macro. Now either 64bit- or 32bit-multiplies are possible.
--- End quote ---
And you're sure this alone increases performance? I'll sure be looking forward to seeing a patch if this is the case.
--- Quote from: Buschel on May 13, 2007, 11:04:21 AM ---Current I am measuring different combinations of the optimization -- including using -O1 instead of -O2. Including all optimizations (also accuracy reduction) the performance went up by up to 18%.
--- End quote ---
Sounds great! Like I said, I'm looking forward to seeing a patch.
Buschel:
Hello again,
finalized my "first shot" on musepack performance optimization. I compiled against current Trunk (13378) and tested on my iPOD-Video (30G). Tesfile was a .mpc-File with avg. 196kbps and 348 seconds duration and the "test-codec"-plugin.
Measurements:
Trunk-Version decodes in 154s (2,26x realtime)
Trunk-Version with -O1 decodes in 140s (2,49x realtime, +10%)
Changes in code and their results on top of Trunk/-O1 (synth_filter.c, makefile):
Apply 32bit-multiplies for calc_new_V() -> decodes in 131s (2,65x realtime, +17%)
Apply 32bit-multiplies for windowing -> decodes in 128s (2,72x realtimem, +20%)
Remark: The 32bit-multiplies are faster than the ARM-assembler for 64bit-multiplies on -O1
I also checked the decoder output (Thanks to linuxstb for the wav-write functionality) against the output of my WinAmp-Decoder -- I will do further testing against Trunk-version.
The results show that there are differences with a maximum of +/- 30 (for 16Bit samples). In spectral view the additional noise is connected to the filterbank borders -- what was to be expected -- and has its maximum at roundabout -90dB. The signal itself it always roundabout 40-60dB above -- so it shall be masked anyway.
ToDo: The switch to 32bit-multiplies was a first shot to get in touch with the possibilities we have on optimizations. The current implementation will do fixed defined shifts for dct- and window-coefficient-multiplies. For fewer loss of accuracy I propose to do further tweaking for multiplications of samples and coefficients, e.g. taking into account the value of the coefficient to do optimized shifts before multiplications.
Next question: How do I make a patch available? :)
cu,
Buschel
nls:
If you got the source with svn, just do a:
svn diff > patchfile.patch
from the source root (where folders like apps, firmware, etc are)
and then post it as a patch on flyspray, our tracker, http://www.rockbox.org/tracker
Buschel:
Ok, worked fine for me. Added my first patch :o)
Navigation
[0] Message Index
[*] Previous page
Go to full version