Rockbox Technical Forums

Support and General Use => Audio Playback, Database and Playlists => Topic started by: dconrad on July 23, 2021, 11:57:25 AM

Title: Bit depth, Dithering, and the Rockbox audio path
Post by: dconrad on July 23, 2021, 11:57:25 AM
This is entirely out of pure curiosity, but I recently learned what dithering is, and noticed Rockbox has this capability. That got me thinking, I know parts of the RB audio path/pipeline are fixed to 16-bit. What does the audio path's bit depth look like, anyway? Maybe there's already a thread that explains this, but I did a quick search and didn't get much, especially within the last decade.

Say I've got a 32-bit FLAC file. I understand most of the sound processing stuff is 32-bit, so does it stay 32-bit all the way through the DSP stuff and then get truncated/dithered once to 16-bit, or does it get converted to 16-bit, upped to 32-bit (if stuff is enabled), and then converted again to 16-bit?

If the latter, is dithering applied both times the audio is converted down?

Another quite a bit more specific question... is dithering twice a bad thing, if the audio must be converted to lower bit depth twice? I'm working on getting higher bit depth volume working on the Eros Q, and it will go 16 --> 32 --> 24 (that's the plan, anyway...), could/should it be dithered there as well? Once I get the basic functionality working, anyway.

Edit: Or, really the ideal case would probably be to do volume scaling direct to 24 and avoid that whole question anyway, but I'd have to understand the math well enough to make that happen  ;D One step at a time...
Title: Re: Bit depth, Dithering, and the Rockbox audio path
Post by: 7o9 on July 23, 2021, 12:26:08 PM
I am not a mathematician so I googled.

Reading this explained to me what you are trying to decide: https://www.waves.com/audio-dithering-what-you-need-to-know

Quantization from higher bit depth to lower does not require dithering as a rule. Doing it multiple times adds noice each time. If you go up from 16/24 to 32 you are not losing any information. Changing the volume at 32 and then going back to the original should still bot lose information. Going from a 24 original to 16 does, as you have to perform quantization: https://en.wikipedia.org/wiki/Quantization_(signal_processing).

In your example, why go down to 24 bit again?
Title: Re: Bit depth, Dithering, and the Rockbox audio path
Post by: dconrad on July 23, 2021, 12:39:08 PM
Quote
In your example, why go down to 24 bit again?

For reasons only Ingenic know, they limited the audio module in the X1000 processor to 24-bit samples, so that's the highest bit depth we can send to the DAC. Not bad by any means, but just like... why not 32?
Title: Re: Bit depth, Dithering, and the Rockbox audio path
Post by: saratoga on July 23, 2021, 05:02:25 PM
This is entirely out of pure curiosity, but I recently learned what dithering is, and noticed Rockbox has this capability. That got me thinking, I know parts of the RB audio path/pipeline are fixed to 16-bit. What does the audio path's bit depth look like, anyway? Maybe there's already a thread that explains this, but I did a quick search and didn't get much, especially within the last decade.

Say I've got a 32-bit FLAC file. I understand most of the sound processing stuff is 32-bit, so does it stay 32-bit all the way through the DSP stuff and then get truncated/dithered once to 16-bit, or does it get converted to 16-bit, upped to 32-bit (if stuff is enabled), and then converted again to 16-bit?

I don't think 32 bit flac exists, but ignoring that, since these are 32 bit processors without floating point everything is done natively on 32 bit.  At the end of processing it is converted to 16 bit in order to double the PCB buffer capacity and because most hardware expects 16 bit samples.  For the most part smaller than 32 bit samples are not used because they're slower on most of our hardware.

Another quite a bit more specific question... is dithering twice a bad thing, if the audio must be converted to lower bit depth twice? I'm working on getting higher bit depth volume working on the Eros Q, and it will go 16 --> 32 --> 24 (that's the plan, anyway...), could/should it be dithered there as well? Once I get the basic functionality working, anyway.

You only dither when reducing bit depth, never increasing.  In this case though its not needed since nothing can actually decode a 24 bit sample, so the dither would be lost anyway.

For reasons only Ingenic know, they limited the audio module in the X1000 processor to 24-bit samples, so that's the highest bit depth we can send to the DAC. Not bad by any means, but just like... why not 32?

Most real world hardware has somewhere between 14 to 18 bits, so values larger than 24 bit do not make sense.  They simply waste memory. 
Title: Re: Bit depth, Dithering, and the Rockbox audio path
Post by: dconrad on July 23, 2021, 06:10:50 PM
I don't think 32 bit flac exists, but ignoring that, since these are 32 bit processors without floating point everything is done natively on 32 bit.  At the end of processing it is converted to 16 bit in order to double the PCB buffer capacity and because most hardware expects 16 bit samples.  For the most part smaller than 32 bit samples are not used because they're slower on most of our hardware.

So that confirms my question that the current audio path only goes down to 16-bit once?

Quote
You only dither when reducing bit depth, never increasing.  In this case though its not needed since nothing can actually decode a 24 bit sample, so the dither would be lost anyway.

Right. So, in my question, I am reducing bit depth a second time (from 32 to 24), so is it (theoretically) advisable to dither a second time?

Quote
Most real world hardware has somewhere between 14 to 18 bits, so values larger than 24 bit do not make sense.  They simply waste memory.

That's the weird thing. For 24 bit mode, the X1000 AIC wants data in a 32-bit container, and if I recall correctly, even the I2S output is padded with zeroes out to 32 bits. So it's not like they're saving memory, as far as I can tell. And yeah, the PCM5102a dac accepts 32-bit data, though what it does internally I don't know.
Title: Re: Bit depth, Dithering, and the Rockbox audio path
Post by: saratoga on July 23, 2021, 06:31:52 PM
It doesn't matter what you do if you're converting down to 24 bit, it'll have no effect. If you were converting to something lower I would do it each time.

Zero padding to 32 bits is probably meant to make it less annoying to send, since natively most hardware doesn't support 24 bit operations.
Title: Re: Bit depth, Dithering, and the Rockbox audio path
Post by: amachronic on July 24, 2021, 08:35:28 AM
It looks like PCM_SW_VOLUME_FRACBITS = 15 by default but it can go up to 16 without the need for 64-bit math. I am no expert here, but it looks to me like the ALSA software volume implementation is effectively working with 16 fractional bits. Maybe this is the reason why the Eros Q is getting artifacts on native but not with hosted ALSA.

To me it seems more likely that this is a precision issue -- after all, if you play back a very quiet file at full volume, the data getting sent to the DAC would be "the same" as playing a loud file at low volume. I would be surprised if you got artifacts on the quiet file. If you did I'm not sure why 24 bit output would solve the problem, unless there's some hardware voodoo going on. You could actually use 'sox' to apply a volume correction to a file and test out this theory. (According to its man page it does dithering after volume correction whereas rockbox's software volume doesn't, so that might affect the results slightly.)

Side note, the X1000 is fast enough you could use 64 bit math if you want to test even higher fracbits (it goes up to 31 or 32, I think). That'll probably drain the battery a bit faster though, so I wouldn't recommend it as a permanent fix.
Title: Re: Bit depth, Dithering, and the Rockbox audio path
Post by: dconrad on July 24, 2021, 11:57:22 PM
It looks like PCM_SW_VOLUME_FRACBITS = 15 by default but it can go up to 16 without the need for 64-bit math. I am no expert here, but it looks to me like the ALSA software volume implementation is effectively working with 16 fractional bits. Maybe this is the reason why the Eros Q is getting artifacts on native but not with hosted ALSA.

Interesting thought. I just did a couple quick tests, setting FRACBITS to 16 didn't seem to change much (and 31 wouldn't even boot, it just crashed). The hosted port is using HAVE_ALSA_32BIT, so the audio is scaled up to 32-bit before being handed over to ALSA, and the volume is applied as part of that scaling (dig_vol_mult_l and dig_vol_mult_r). I'm pretty sure that's why the hosted port is much better about audio artifacts, there's just a lot more resolution at the bottom end of the dynamic scale.

In fact, the hosted port was just as bad as the native port in terms of audio artifacting prior to gerrit patch #3312. It still has a few artifacts in certain situations (that's part of the reason I pursued a native port), but it's a lot better now.

Well, as far as we know, ALSA does seem to bring it down to 24-bit for transfer to the DAC, but I think the point stands.

Though now I do wonder if there's a flaw in the way that the volume scaling is applied in RB, and 16-bit volume could work without artifacting... That I don't know.

Quote
To me it seems more likely that this is a precision issue -- after all, if you play back a very quiet file at full volume, the data getting sent to the DAC would be "the same" as playing a loud file at low volume. I would be surprised if you got artifacts on the quiet file. If you did I'm not sure why 24 bit output would solve the problem, unless there's some hardware voodoo going on. You could actually use 'sox' to apply a volume correction to a file and test out this theory. (According to its man page it does dithering after volume correction whereas rockbox's software volume doesn't, so that might affect the results slightly.)

Side note, the X1000 is fast enough you could use 64 bit math if you want to test even higher fracbits (it goes up to 31 or 32, I think). That'll probably drain the battery a bit faster though, so I wouldn't recommend it as a permanent fix.

I think an important difference is that a quiet file still has to go through the main 16-bit part path, whereas the volume scaling, if done at a higher bit depth, doesn't have to go back down to 16 bits - it's done after that.
Title: Re: Bit depth, Dithering, and the Rockbox audio path
Post by: saratoga on July 25, 2021, 12:16:21 PM
Not sure I follow exactly what you're trying to do here, but FRACBITS is the position of the decimal point in each fixed point sample.  In config.h it is defined to be 16 bits, so you have 16 integer bits and then 16 fractional bits for a total of 32 bits.  Since your DAC can only do ~ 17 or 18 bits, this is orders of magnitude more precision than you actually need and adding more will do nothing. 

What is the problem you are actually trying to solve?    What are artifacts are you talking about?  Loud samples clipping?  Quiet samples clamping to zero?  Something else?
Title: Re: Bit depth, Dithering, and the Rockbox audio path
Post by: dconrad on July 25, 2021, 10:16:41 PM
Mostly I just started this thread due to my interest in when it's appropriate to use dithering and what the current RB audio chain actually looks like in terms of bit depth. Like I said, essentially pure curiosity.

However, we did get a little sidetracked... The artifacting issue I'm referring to is that with the standard software volume scaling on the Eros Q (Native, and prior to Gerrit patch #3312, hosted as well), there is popping/clicking in dynamic portions of music/podcasts, particularly when a louder portion follows a quieter portion. This was particularly bad at low volumes, but could be heard quite a way up into the volume range.

Quote
Since your DAC can only do ~ 17 or 18 bits

Where do you get this information? I don't seem to see it in the datasheet.
Title: Re: Bit depth, Dithering, and the Rockbox audio path
Post by: saratoga on July 25, 2021, 11:02:23 PM
Distortion on loud parts is usually integer overflow.  Since the volume control works for other targets, probably that means that  the fixed format you're handing the code isn't what the code expects it to be.  Have you tried playing back a full scale sin wave, then a 50% (-1 bit), then a 25% (-2 bits)... sin until you stop hearing distortion?  If it goes away at some point you know how many bits the format is off by. 

17-18 bits is about the practical limit for a high end DAC + amp used in a portable player.  I didn't look up what device you're using, but if >16 bit helps it is probably in this range.