Arm Asm memcpy progress

Rockbox Development > Starting Development and Compiling

(1/1)

TP Diffenbach:
I've been working on an assembly language memcpy in ARM assembly for ipod targets; currently it is faster on large copy lengths (the slope of the line is less, meaning the absolute difference gets better an better the longer the copy length) than either the linux kernel's or uclinix's memcpy. (And faster than the C memcpy we currently use.)

For all copy lengths, the absolute speed is always lower than the linux kernel's memcpy.

For word aligned data or same aligned data (same aligned: dst | 3 ! = 0 && dst | 3 == src | 3 ), the absolute speed is also always less than or only negligibly higher than uclinux's (and considerably less as we reach longer copy lengths). But uclinux's memcpy has a faster absolute speed for mixed aligned data (dst | 3 != src | 3 ) when the copy length is < ~64 bytes.

My worry is that short copy lengths will predominate for mixed aligned data.

So this is the good news:

And this is the bad news (this is the same graph, showing only copy lengths from 0-64):

Here's the bad news, isolating only the unaligned copies:

All times are in microseconds (millionths of a second) for 100 calls to each function. The X axis is bytes copied (that is, just the third argument to memcpy)

Tiennou:
Great work! :)

Navigation

[0] Message Index

Go to full version