Rockbox Technical Forums

Rockbox Development => Starting Development and Compiling => Topic started by: TP Diffenbach on July 08, 2006, 12:29:16 PM

Title: Arm Asm memcpy progress
Post by: TP Diffenbach on July 08, 2006, 12:29:16 PM: I've been working on an assembly language memcpy in ARM assembly for ipod targets; currently it is faster on large copy lengths (the slope of the line is less, meaning the absolute difference gets better an better the longer the copy length) than either the linux kernel's or uclinix's memcpy. (And faster than the C memcpy we currently use.)

For all copy lengths, the absolute speed is always lower than the linux kernel's memcpy.

For word aligned data or same aligned data (same aligned: dst | 3 ! = 0 && dst | 3 == src | 3 ), the absolute speed is also always less than or only negligibly higher than uclinux's (and considerably less as we reach longer copy lengths). But uclinux's memcpy has a faster absolute speed for mixed aligned data (dst | 3 != src | 3 ) when the copy length is < ~64 bytes.

My worry is that short copy lengths will predominate for mixed aligned data.

So this is the good news:
(http://img301.imageshack.us/img301/6120/mineuclinux2zl.th.jpg) (http://img301.imageshack.us/my.php?image=mineuclinux2zl.jpg)

And this is the bad news (this is the same graph, showing only copy lengths from 0-64):
(http://img296.imageshack.us/img296/6770/mineuclinux645em.th.jpg) (http://img296.imageshack.us/my.php?image=mineuclinux645em.jpg)

Here's the bad news, isolating only the unaligned copies:
(http://img461.imageshack.us/img461/2451/mineuclinux64unaligned4yf.th.jpg) (http://img461.imageshack.us/my.php?image=mineuclinux64unaligned4yf.jpg)
All times are in microseconds (millionths of a second) for 100 calls to each function. The X axis is bytes copied (that is, just the third argument to memcpy)
Title: Re: Arm Asm memcpy progress
Post by: Tiennou on July 09, 2006, 06:01:09 PM: Great work! :)