Good news!
I'll try explain the fix: About 6 months ago I reworked this code section. The fixed point implementation was totally wrong and just worked by chance. The multiplied and added data has a 31 bit fract part. To avoid overflow when adding up the data I shifted the data by >>2 before the calculation and shifted the results back accordingly. The simple fact is: >>2 was not enough.
I could explain in more depth why 3 is the correct value. But I will not go into more detail right now.
Anyway, thanks for testing!
Edit: Fixed with r29622 on Trunk and r29623 on v3.8 branch