Had some time and did some tests with the same file VST processed crossfeed off versus crossfeed on the same Rockbox, and can confirm the difference. It's definitely the difference of large clean layered sound of ER-4s but with a roomy mix versus smaller, and less cleaned, squashed sound. The room+details are distinctly good (usually one of the other), and I think that's what drawn me to Rockbox, and it's for doing transcriptions.
I wanted to go ahead and port the Rockbox code, but wanted to see if you can share your code, as I have done DSP in school, but not VST.