Support and General Use > Audio Playback, Database and Playlists

Decoding performance on iPOD

<< < (4/4)

preglow:

--- Quote from: Buschel on May 13, 2007, 11:04:21 AM ---Regarding the use of doubles/float in the synthesis-filter: As far as I can see there are several scaling operations in calc_new_V() which use the macros MPC_SCALE_CONST, MPC_MULTIPLY_FRACT_CONST_FIX and similar ones. These macros are expanded with usage of the MAKE_MPC_SAMPLE_EX macro. This one will use double-conversion and -multiplication.

--- End quote ---
And if you have a look in math.h, you'll see that the floating numbers will be converted to a fixed point number by MAKE_MPC_SAMPLE_EX, MAKE_MPC_SAMPLE and other macros like them. If GCC does not do this conversion at compile-time, it's even more crap than I thought it was. Looking at Calculate_New_V() in a disassembler, I can see no floating point calls, but a lot of mac.l instructions, which indicates that things are working like they should.


--- Quote from: Buschel on May 13, 2007, 11:04:21 AM ---I removed these conversions via defining additional constants and a new macro. Now either 64bit- or 32bit-multiplies are possible.

--- End quote ---
And you're sure this alone increases performance? I'll sure be looking forward to seeing a patch if this is the case.


--- Quote from: Buschel on May 13, 2007, 11:04:21 AM ---Current I am measuring different combinations of the optimization -- including using -O1 instead of -O2. Including all optimizations (also accuracy reduction) the performance went up by up to 18%.

--- End quote ---
Sounds great! Like I said, I'm looking forward to seeing a patch.

Buschel:
Hello again,

finalized my "first shot" on musepack performance optimization. I compiled against current Trunk (13378) and tested on my iPOD-Video (30G). Tesfile was a .mpc-File with avg. 196kbps and 348 seconds duration and the "test-codec"-plugin.

Measurements:
  Trunk-Version decodes in 154s (2,26x realtime)
  Trunk-Version with -O1 decodes in 140s (2,49x realtime, +10%)

Changes in code and their results on top of Trunk/-O1 (synth_filter.c, makefile):
  Apply 32bit-multiplies for calc_new_V() -> decodes in 131s (2,65x realtime, +17%)
  Apply 32bit-multiplies for windowing -> decodes in 128s (2,72x realtimem, +20%)
  Remark: The 32bit-multiplies are faster than the ARM-assembler for 64bit-multiplies on -O1

I also checked the decoder output (Thanks to linuxstb for the wav-write functionality) against the output of my WinAmp-Decoder -- I will do further testing against Trunk-version.
The results show that there are differences with a maximum of +/- 30 (for 16Bit samples). In spectral view the additional noise is connected to the filterbank borders -- what was to be expected -- and has its maximum at roundabout -90dB. The signal itself it always roundabout 40-60dB above -- so it shall be masked anyway.

ToDo: The switch to 32bit-multiplies was a first shot to get in touch with the possibilities we have on optimizations. The current implementation will do fixed defined shifts for dct- and window-coefficient-multiplies. For fewer loss of accuracy I propose to do further tweaking for multiplications of samples and coefficients, e.g. taking into account the value of the coefficient to do optimized shifts before multiplications.

Next question: How do I make a patch available? :)

cu,
Buschel

nls:
If you got the source with svn, just do a:

svn diff > patchfile.patch

from the source root (where folders like apps, firmware, etc are)

and then post it as a patch on flyspray, our tracker, http://www.rockbox.org/tracker

Buschel:
Ok, worked fine for me. Added my first patch :o)

Navigation

[0] Message Index

[*] Previous page

Go to full version