That coming from you is particularly odd. I first ran rockbox 15y ago on absolute trash hardware. Great UI does not require any resources at all, if done right. The codec will require more.
Literally a handful of lines of logically arranged text do the trick. Haven't used rb in many years, but I remember being very impressed with what you had back then. The doom and such were funny, but of course we don't need that.
The "absolute trash hardware" from 15 years ago had about 8x the RAM as more modern shovelware players.
And sure, we can save a non-trivial amount of runtime RAM if we ditch the themable UI. And free up even more RAM if we go back to HWCODEC onlly. And make it so you can play music OR do anything else (eg browse files). Oh, and the translation engine and voiced UIs. And so forth.
The point being, by the time we do all of that work to completely rewrite [note this is _not_ an exaggeration] the codebase to make something _fit_ into the limited RAM of these shovelware devices, what's left is no longer recognizable as "rockbox" any more, lacking all of the cool features that folks expect to get. So what exactly is the point of doing all of that work?
Incidentally, the least capable port we have today, ie the Sansa Clip with 2MB of RAM, no SD card, and a 128x64 monochrome screen, needs 344KB of RAM to even _boot_, This doesn't include memory for (any!) buffers -- eg file/directory caches, SD dma buffers, theme resources, etc -- or things like codec or plugin scratch space.
Then consider that most of these shovelware players have 224K (or less) of _total_ RAM, and we have _zero_ documentation how how to use them.
Edit: The Sansa Clip has 2MB, not 2KB.