Yes, rockbox uses speex as the voiceformat for all newer players, then the old archos ones.
And its even not normal speex (i think the headers are removed), so you should use rbspeexenc to encode those clips.
So best would be to either use voicebox+ or rbutil to generate the files. (If you have problems on Vista with this Tools, make a Bugreport, or try to fix it yourself :-) )