Rockbox Development > Starting Development and Compiling

Beginning steps in kernel disassembly and analysis

(1/9) > >>

dconrad:
With respect to amachronic's post in the Eros Q/K thread about how to go about reverse engineering the X1000 linux kernel, I thought I might start a dedicated thread if I'm going to keep pursuing this. I'm pretty sure I will have lots of really basic/beginner questions, and should probably put them here rather than stay in that thread where they're probably off-topic. And besides, maybe they'll help somebody in the future anyway.

To be honest, I don't expect to fully complete this on my own, but I thought I might get the ball rolling, and see if we make any progress.

So far I've successfully pulled out the following files from the rockbox-bootloader-patched update file from the ErosQ/K wiki page H2-v13-patched.upt. I'm using this one only because that's the particular device I have.

Out of the .upt file (which, it turns out, is really just an ISO with a different file extension), I got:


--- Code: ---VERSION.TXT
UPDATE.TXT
SYSTEM.UBI
UIMAGE.BIN
UBOOT.BIN
_GITIGNO
--- End code ---

So far so good, right?

Then I extracted the UIMAGE.BIN file as recommended with binwalk -e and got:


--- Code: ---[user@localhost erosq kernel]$ binwalk -e UIMAGE.BIN

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             uImage header, header size: 64 bytes, header CRC: 0xD30BDBEF, created: 2020-03-09 06:41:44, image size:
 2612785 bytes, Data Address: 0x80010000, Entry Point: 0x80419300, data CRC: 0xF4F43C27, OS: Linux, CPU: MIPS, image type: OS Kernel Image,
compression type: gzip, image name: "Linux-3.10.14"
64            0x40            gzip compressed data, maximum compression, from Unix, last modified: 1970-01-01 00:00:00 (null date)

[user@localhost erosq kernel]$
--- End code ---

This spit out a 5.4 MiB file called "40" (which I suppose is the address it found it at). I imported this into Ghidra with the recommended MIPS LE 32-bit architecture and base address of 0x80010000. Ghidra seemed to like this fine, and took maybe 15-20 minutes to crunch through the file when I imported it. (my computer is old)

So my first sanity check question - it looks like there's a whole lot of what I think are no-ops before the first function, does this look right? I'll attach a couple screenshots to show what I'm getting. Am I on the right track?

amachronic:
hmm. Looks like you have a u-boot image there as well, and this kernel isn't self-decompressing (u-boot is doing that instead). You can pull the kernel entry address out of the uImage header in this case...


--- Code: ---/* xImage */
00000000  27 05 19 56 ee 20 85 b0  5d c5 14 1d 00 33 b0 00  |'..V. ..]....3..|
00000010  80 f0 00 00 80 f0 00 00  2a d2 2d 06 05 05 02 00  |........*.-.....|
          ^^ startup  ^^ and entry address here
00000020  4c 69 6e 75 78 2d 33 2e  31 30 2e 31 34 2d 73 76  |Linux-3.10.14-sv|
00000030  6e 32 39 36 00 00 00 00  00 00 00 00 00 00 00 00  |n296............|

--- End code ---

Not too sure which is what. It should be obvious in your case because the entry address must be higher than the load address. The kernel entry is in arch/mips/kernel/head.S, fittingly called kernel_entry. It's assembly so you can verify you got it by comparing the instructions. The function you're looking at isn't it. Control will transfer to C code by jumping to start_kernel at the end of kernel_entry and that'll give you a known starting point.

To give you an idea of something you're looking for -- the goal is not to disassemble the entire kernel after all  ;) -- check out this file: arch/mips/xburst/soc-x1000/chip-x1000/halley2/common/board_base.c. There's a function in it called board_base_init(). That's a function you want to locate, because it's the only place you can find a reference to platform_devices_array. Most platform_data structs which driver probe() functions will reference, are pointed to by this array, and from the platform_data struct you can learn GPIOs, details of the LCD interface configuration, clock speeds, and other useful stuff. I2C devices will probably be registered by board_base_init too. But some stuff will be hardcoded willy-nilly in the drivers. (I guess for a kernel you never update, maintainability is not a problem...)

Since board_base_init is an initcall it can be located in a table. See init/main.c, look for the initcall_levels array. And then look at the initcall_level_names below it. You can easily find the names array because it references a unique string "postcore". Once you find that you can find do_initcall_level() and from that, find initcall_levels. The initcalls are set up in one big table and the first entry of initcall_levels will point at the first one. You can trawl through the init calls one by one until you locate board_base_init(). That may require identifying a few other functions first, but the payoff is worth it because some information simply can't be found any other way.

Some other random notes:

* Try to identify BUG_ON() and WARN_ON() statements. They are macros so this requires a bit of digging in the Linux source to match it up with the binary, but they can give you exact file and line number information and the source files using those macros are unlikely to have been changed. Many Linux API functions can be discovered this way.
* You can find many GPIOs by identifying gpio_request(). A lot of the gpiolib functions contain code like printk(..., __FUNC__), and you can identify most of them that way very easily.
* If you have dmesg output from the OF kernel, you can try looking for interesting messages that pop up there and seeing what code prints them.
* LCD code: kernel/arch/mips/xburst/soc-x1000/chip-x1000/halley2/common/lcd/lcd-truly_tft240240_2_e.c. See how the platform data points to useful things... you want to find stuff like that.Look at my Shanling Q1 patch in gerrit for an idea of what kind of information you actually need to grab. I think most, if not all of the chips you're dealing with have decent datasheets so finding the GPIOs and LCD related stuff is the most important. I2C bus numbers are useful too, but then again, you can just brute force them if you really have to, as long as you know what chips are supposed to be there.

dconrad:
Some small progress I think:

So if I understand correctly, the kernel entry address given by the uimage header (read out when I used binwalk) is 0x80419300, and I can use this to directly find the corresponding address in the kernel. If I go to that address, it seems to match up, more or less, with this line in the generic source. (see screenshot)

Or, more accurately, I think that line 185 lines up with address 0x80419340


--- Code: ---PTR_ADDIU  t0, LONGSIZE
LONG_S  zero, (t0)
bne  t0, t1, 1b

LONG_S  a0, fw_arg0
LONG_S  a1, fw_arg1
LONG_S  a2, fw_arg2
LONG_S  a3, fw_arg3

MTC0  zero, CP0_CONTEXT
...
...
...
j  start_kernel
--- End code ---


--- Code: ---                             LAB_80419340                                    XREF[1]:     80419348(j) 
        80419340 04 00 08 25     addiu      t0,t0,0x4
        80419344 00 00 00 ad     sw         zero,0x0(t0)=>DAT_80580004
        80419348 fd ff 09 15     bne        t0,t1,LAB_80419340
        8041934c 00 00 00 00     _nop
        80419350 5a 80 01 3c     lui        at,0x805a
        80419354 20 23 24 ac     sw         a0,offset DAT_805a2320(at)         --> (matches up with LONG_S a0, fw_arg0)
        80419358 5a 80 01 3c     lui        at,0x805a
        8041935c 1c 23 25 ac     sw         a1,offset DAT_805a231c(at)          --> (matches up with LONG_S a1, fw_arg1)
        80419360 5a 80 01 3c     lui        at,0x805a
        80419364 18 23 26 ac     sw         a2,offset DAT_805a2318(at)          --> (etc.)
        80419368 5a 80 01 3c     lui        at,0x805a
        8041936c 14 23 27 ac     sw         a3,offset DAT_805a2314(at)          --> (etc.)
        80419370 00 20 80 40     mtc0       zero,Context,0x0
        ...
        ...
        ...
        804193a0 68 31 15 08     j          FUN_8054c5a0                                     undefined FUN_8054c5a0()
--- End code ---

It also seems that the fact that the kernel was gzipped inside the .bin file is not an issue? Is that true?


Edit to add: I just realized this, and it should have probably been obvious: on the macro scale, there are two intermediary labels present in the disassembled code (addresses 0x8041932c and 0x80419340), which probably line up with labels "0:" (line 162) and "1:" (line 185), huh? So now I'm feeling pretty dang confident this is correct, and the fact that it lines up with the uimage header's stated entry address makes it seem like I have valid code on my hands!

dconrad:
One thing that took me a minute to realize as well is that instructions like PTR_LA, PTR_ADDU, PTR_SUBU, LONG_S, etc. are just macros defined mostly in kernel/arch/mips/include/asm/asm.h

dconrad:
Some real progress now!


--- Quote from: amachronic on June 13, 2021, 08:29:13 PM ---To give you an idea of something you're looking for -- the goal is not to disassemble the entire kernel after all  ;) -- check out this file: arch/mips/xburst/soc-x1000/chip-x1000/halley2/common/board_base.c. There's a function in it called board_base_init(). That's a function you want to locate, because it's the only place you can find a reference to platform_devices_array. Most platform_data structs which driver probe() functions will reference, are pointed to by this array, and from the platform_data struct you can learn GPIOs, details of the LCD interface configuration, clock speeds, and other useful stuff. I2C devices will probably be registered by board_base_init too. But some stuff will be hardcoded willy-nilly in the drivers. (I guess for a kernel you never update, maintainability is not a problem...)

Since board_base_init is an initcall it can be located in a table. See init/main.c, look for the initcall_levels array. And then look at the initcall_level_names below it. You can easily find the names array because it references a unique string "postcore". Once you find that you can find do_initcall_level() and from that, find initcall_levels. The initcalls are set up in one big table and the first entry of initcall_levels will point at the first one. You can trawl through the init calls one by one until you locate board_base_init(). That may require identifying a few other functions first, but the payoff is worth it because some information simply can't be found any other way.
--- End quote ---

I'm working through your example here, and I've identified - with a fair amount of certainty - the following:


* start_kernel()
* pr_notice()
* parse_args()
* "postcore" string
* initcall_level_names[]
* do_initcall_level()
* initcall_levels[]
* initcallx_start[] (0 through 7)
It took me a minute to figure out that you can not only bookmark stuff, but rename everything so the things you discover can propagate through the code.

It also took a bit of studying, but I think I see how items are added to the __initcall0_start[] arrays - through some magical define sorcery that took me like an hour to stare at, they use the labels "early", "core", "postcore", "arch", "subsys", "fs", "device", and "late" to correspond to levels 0-7, and then add functions by calling, for example, core_initcall(octeon_no_pci_init); to add octeon_no_pci_init() to the list of functions to run in level 1. Am I on track? Or off base? Or do I get a penalty for mixing my metaphors?  :P

I still haven't identified board_base_init though...

Navigation

[0] Message Index

[#] Next page

Go to full version