Ghidra reverse engineering tool - page 3 - General Discussion and Assistance - CHDK Forum supplierdeeply

Ghidra reverse engineering tool

  • 36 Replies
  • 19457 Views
*

Offline reyalp

  • ******
  • 14080
Re: Ghidra reverse engineering tool
« Reply #20 on: 29 / December / 2019, 01:25:22 »
Advertisements
Updated ImportCHDKStubs.py and added two more scripts:

InitCHDKMemMap.py  - Use information from stubs_entry.S to create a memory map in freshly loaded, not analyzed dumps. It should work reasonably well on most digic 2-5, as well as known digic 6 and 7 firmware. (there may be some oddballs that don't work well, s110 with ALT_ROMBASEADDR for example)
The memory map includes
* Copied code and data for the main CPU, with code marked executable
* ROM is split into executable and non-executable regions, for the main firmware code and romstarter where identified.
* Uninitialized regions are created for RAM outside of copied areas, uncached RAM, and known MMIO regions. MMIO regions are marked as volatile (the only effect I can see is decompiler showing things like "read_volatile_4()", but it makes MMIO access stand out a bit)

Creating big uninitialized blocks for RAM, uncached RAM and MMIO seems to have positives and negatives: Positive, because analysis can recognize data references, references are easy to follow, and things outside the defined areas stand out. Negative, because analysis looking for data references can see a lot of values as addresses just because a large portion of the address space is valid, especially on d6 and d7 cams with lots of RAM.

CleanThumb2BookmarkErrors.py - This attempts to clean up some common issues with autoanalysis on thumb2 firmware. It should be run after initial auto-analysis, or after large chunks of code are analyzed.

Background: Ghidra creates "bookmarks" when the disassembler runs into problems, like invalid instructions or data already defined where it wants to disassemble. You can see  them by enabling the bookmarks window. There are other informational bookmarks too, to see just errors, click on the gear in the upper right of the window.

CleanThumb2BookmarkErrors.py iterates over "error" bookmarks and tries to clean up some common issues, like data being defined at +1 of a thumb function, and places where the disassembler started disassembling in ARM when it shouldn't have. It probably doesn't always do the right thing, but it's a significant improvement on the dumps I ran it on. By default, it removes the bookmarks there appears to be valid disassembly at the bookmark address. This can turned of by changing remove_resolved at the top of the file.

ImportCHDKStubs.py updates
* Now prompts for whether to disassemble, or just create entry points.
* Entry point mode now sets the thumb register, so auto-analysis will know which mode to start in
* Works on non-T processors for digic < 6. Since canon doesn't appear to use thumb on those cameras, using the non-T variants may avoid the disassembly mistakenly going off in thumb mode.
* Adds labels / entry points on the main firmware start, and some romstarter entries. This is useful to trigger analysis on early parts of the firmware startup.

The scripts now have a menupath set, so if you select "in tool" they'll appear under the Ghidra source viewer tools menu.

stubs_loader module updates
* Now parses comments out of stubs_entry.S, and derives a bunch of useful values from them
* Can be used by regular python 2 or 3 outside ghidra (not heavily tested)


With these updates, my suggested workflow is
* Load the PRIMARY.BIN at the rom start address
* Open the firmware, cancel auto-analysis
* Run InitCHDKMemMap.py
* Run ImportCHDKStubs.py in entry point mode
* Set auto-analysis options and run
* For thumb2 firmware, run CleanThumb2BookmarkErrors.py after auto-analysis completes

Some suggestions for auto-analysis
* Turn off "embedded media" for the first run, as it seems to misidentify some things as WAV in code. Run it from the one-shot menu afterwards instead
* Turn off "Non-returning functions - discovered". This seems to cause disassembly to stop in a lot of places it shouldn't
* Turn on "Shared return calls". This helps deal with code that does a b ... after a pop lr.
* Turn off "address tables". I'm not sure about this one, but I think it's better run as a one-shot after initial analysis, to avoid creating data from runs of things that could be addresses.

One other important thing: The auto-analysis options don't just apply to the initial, full analysis, they also apply whenever new code is disassembled. So if you turn something off for the initial run, you may want to re-enable it after. The settings are still saved when you cancel
Don't forget what the H stands for.

*

Offline reyalp

  • ******
  • 14080
Re: Ghidra reverse engineering tool
« Reply #21 on: 30 / December / 2019, 18:54:46 »
I started a wiki page https://chdk.fandom.com/wiki/Firmware_analysis_with_Ghidra

I'll probably make a another one for the version tracking tool. https://chdk.fandom.com/wiki/Ghidra_Version_Tracking_workflow_for_porting

edit:
And 9.1.1 was released recently. Seems to be minor bugfix update.
« Last Edit: 31 / December / 2019, 00:57:12 by reyalp »
Don't forget what the H stands for.

*

Offline koshy

  • *****
  • 1096
Re: Ghidra reverse engineering tool
« Reply #22 on: 06 / January / 2020, 20:03:30 »
Thank you for the effort with the wikia pages. I followed through them as I decided to re-start my Ixus 100 VS Ixus 990 comparison project after you adjusted your scripts to no longer choke on non thumb arm v5. I found them well written. One thing that wasn't obvious to me until some point is that there are really but a few ROM base addresses but I don't think that is a need to know. Anyway as far as the comparison is concerned I found a few items for my stubs_entry_2.S I did not find in my arm v5t project. I'm really not 100% sure if they weren't there but I do recall specifically looking on the list of matches for them... I had little time for this in the last few weeks but I guess I made ok progress I get a splash screen followed by a crash and should be through the more obvious parts of boot.c, maybe all of capt_seq.c and movie_rec.c as far as the reference port has them. I'm really impressed with codegen, although I don't think I'll come out of this with any deeper understanding than before. With these tools a trained monkey could get as far as I got but it sure helps in appreciating the uniqueness and - unlikeliness for that matter - of CHDK as a whole.
Koshy had a little ELPH which wasn't white as snow but everywhere that Koshy went the ELPH was sure to go. (actually an SD, but that detail ruins the rhyme...)

*

Offline koshy

  • *****
  • 1096
Re: Ghidra reverse engineering tool
« Reply #23 on: 06 / January / 2020, 20:12:10 »
I forgot to add the question I had in mind... In the analysis process only few error bookmarks were created. You wrote that it's worthwhile to try to resolve them. Where do I start if I'd like to try that. The screen grab has the list and the first instance of a bookmark in the error category in the code window... Ghidra works neatly on the vertical / horizontal dual screen layout I use a lot.
Koshy had a little ELPH which wasn't white as snow but everywhere that Koshy went the ELPH was sure to go. (actually an SD, but that detail ruins the rhyme...)


*

Offline reyalp

  • ******
  • 14080
Re: Ghidra reverse engineering tool
« Reply #24 on: 06 / January / 2020, 21:19:05 »
In the analysis process only few error bookmarks were created. You wrote that it's worthwhile to try to resolve them. Where do I start if I'd like to try that.
1) figure out why disassembly failed
2) fix it
:)

#1 is usually one of:
a) Ghidra thought data was code, and tried to disassemble it.
b) Ghidra thought code was data, defined data, and then ran into it diassembling.
c) Ghidra started disassembling in the wrong arm/thumb state, or on the wrong half-word alignment in thumb mode (mostly applies to armv7 / digic 6 or later only)
d) Ghidra tried to disassemble something that references an address that doesn't have initialized data in the memory map

#2 is usually a matter of undefining the incorrect thing (select and hit 'c') and then, if it was code, disassembling ('d' or f11 or f12 depending on instruction set)

Sometimes there's a sequence of problems, like it dissasembled some data, that data happened to look like a function call and jumped off into other data, where the error finally occurred.
Quote
The screen grab has the list and the first instance of a bookmark in the error category in the code window...
That's a fairly common example of case a.

If you look at the preceding disassembly, there's two sets conditional
ldmXX sp!... (AKA pop)
bXX ...
instructions with opposite conditions (NE and EQ)
So the function actually ends at the BEQ, but because they are conditional, Ghidra didn't notice and tried to disassemble the string, which was already data ("ShtCon_Face..." in the lower example). In this case the data is correctly data, so actually there's nothing wrong and you can just delete the bookmark.
If there hadn't been defined data there, you'd generally get some nonsense instructions until it ran into one that wasn't valid, which you could clear and convert to data.

There is actually an analysis option that should avoid this:
Under "shared return calls" you can set "allow conditional jumps" which should be exactly the situation in your screenshot.

I didn't recommended this on the wiki because I hadn't figured out if causes worse issues in some other cases. I've since tried it on some and not noticed any problems, but haven't looked closely.

Quote
I followed through them as I decided to re-start my Ixus 100 VS Ixus 990 comparison project after you adjusted your scripts to no longer choke on non thumb arm v5.
I may be missing something, but if you are working on an un-ported sub of ixus100, I'd suggest using of the existing ixus100 subs as your base. You should nearly perfect matches with different firmware versions of the same cam.
Don't forget what the H stands for.

*

Offline koshy

  • *****
  • 1096
Re: Ghidra reverse engineering tool
« Reply #25 on: 08 / January / 2020, 21:48:00 »
Thanks a lot for the extensive answer.

There is actually an analysis option that should avoid this:
Under "shared return calls" you can set "allow conditional jumps" which should be exactly the situation in your screenshot.
Great, I deleted the error bookmarks and re-ran Analysis with this one. At least some errors resolved and the others seem to have been data interpreted as code albeit I'm not as versed at telling the two apart as you are. Anyway, this is still interesting to pursue.

I may be missing something, but if you are working on an un-ported sub of ixus100, I'd suggest using of the existing ixus100 subs as your base. You should nearly perfect matches with different firmware versions of the same cam.
I'm not working on anything i100. I wanted to do a port with codegen for an all new camera. I picked Ixus 990. As it is dryos 31 as is Ixus 100 and as Ixus 100 is fully done by codegen which I meant to explore I chose that as a base to start from. Hence the i100 to i990 comparisons. I did not start a porting thread because I wanted to see how far I'd get just with all the things available here and on the wiki. nafraf offered a helping hand if I get stuck so we'll see. I can always start a porting thread if I need more help  :)
Koshy had a little ELPH which wasn't white as snow but everywhere that Koshy went the ELPH was sure to go. (actually an SD, but that detail ruins the rhyme...)

*

Offline reyalp

  • ******
  • 14080
Re: Ghidra reverse engineering tool
« Reply #26 on: 08 / January / 2020, 22:43:41 »
Great, I deleted the error bookmarks and re-ran Analysis with this one. At least some errors resolved and the others seem to have been data interpreted as code albeit I'm not as versed at telling the two apart as you are.
Good to know. As far as recognizing data disassembled as code, you get a feel for it after a while. Some common things:
* In ARM code, zero encodes as "andeq   r0, r0, r0".
* In thumb, it's movs r0,r0
* Coprocessor instructions involving a coprocessor other than p15, or instructions other than MRC or MCR are almost always not valid
* Sometimes the instructions look valid, but what it's doing is nonsense, like if you see
mov r0, r1
mov r0, r2
you know the second mov makes the first one useless, so a compiler probably didn't generate it.
* Sometimes if you look at the data values, you'll recognize an ASCII string or something else that's clearly data.

Quote
I'm not working on anything i100. I wanted to do a port with codegen for an all new camera. I picked Ixus 990.
That makes sense, I saw the 990 and incorrectly assumed SD990, which is already ported.
Don't forget what the H stands for.

*

Offline reyalp

  • ******
  • 14080
Re: Ghidra reverse engineering tool
« Reply #27 on: 18 / May / 2020, 02:33:51 »
Ghidra 9.12 was released in February. I've been using it for a while, seems fine, not significantly different from 9.11.

For the CHDK ghidra scripts, in r5502-5503 I fixed some bugs that I introduced on March 17 (r5447) which prevented InitCHDKMemMap.py from setting up TCM code and exception vector regions on digic >=6 cams
Don't forget what the H stands for.


*

Offline reyalp

  • ******
  • 14080
Re: Ghidra reverse engineering tool
« Reply #28 on: 22 / May / 2020, 02:10:14 »
Added two more scripts:

ListPropCalls.py
lists locations where GetPropertyCase or SetPropertyCase are called with specific propcase IDs. So for example, if you want to know where all calls related to PROPCASE_TV and PROPCASE_AV are, you can enter TV AV in the prompt, and click on the output addresses.

CommentPropCalls.py
iterates over all the known references to the propcase functions, and adds comments with the propcase ID of known propcases

Both of these only work when the call is in a function. They will warn in the console if there are calls in code that isn't part of a function. In most cases, you can fix this up using "f" or "recreate function" in ghidra

Note that when ghidra can't tell exactly what's in a register, it can produce "register relative" values. These are relative to the value on entry to the function, so it will often show something like sp - 0x8 for pointer to the propcase value. This not SP - 8  at the time of the call.

The code for leveraging ghidra's analysis tools to determine register values is in chdklib/regsanalyzer.py
This could be used to whip up other "where is function ... called with constant arguments ..." type things.

I found the required APIs looking at https://github.com/0xb0bb/pwndra

Some possible enhancements
* Use a tableChooserDialog for listing
* Remember the platform propset somehow, so you don't have to provide the header every time
* Handle veneers, other propcase related functions
* Generic script that prompts for function, argument values
Don't forget what the H stands for.

*

Offline reyalp

  • ******
  • 14080
Re: Ghidra reverse engineering tool
« Reply #29 on: 26 / July / 2020, 18:40:30 »
A couple updates from earlier this month:
* Added CleanFuncBookmarks.py to make functions from things identified by Ghidra as possible functions. It also cleans up some cases where Ghidra makes a single function out of chunks of code that should be multiple, separate functions. Should be run after analysis is complete.
* Updated ImportCHDKStubs.py to try to create functions when it's run with disassembly enabled (analyzed dump mode). For best results on a fresh dump, this script should be run once before analysis with disassembly disabled, and again after analysis with disassembly enabled.

Both of these should improve the effectiveness of the version tracking tool, since it depends on identified functions.
Don't forget what the H stands for.

 

Related Topics