Ghidra reverse engineering tool - page 2 - General Discussion and Assistance - CHDK Forum

Ghidra reverse engineering tool

  • 17 Replies
  • 1873 Views
*

Offline philmoz

  • *****
  • 3138
    • Photos
Re: Ghidra reverse engineering tool
« Reply #10 on: 30 / May / 2019, 05:50:29 »
Advertisements
Ghidra 9.0.4 available - https://www.ghidra-sre.org/releaseNotes_9.0.4.html
CHDK ports:
  sx30is (1.00c, 1.00h, 1.00l, 1.00n & 1.00p)
  g12 (1.00c, 1.00e, 1.00f & 1.00g)
  sx130is (1.01d & 1.01f)
  ixus310hs (1.00a & 1.01a)
  sx40hs (1.00d, 1.00g & 1.00i)
  g1x (1.00e, 1.00f & 1.00g)
  g5x (1.00c, 1.01a, 1.01b)

*

Offline reyalp

  • ******
  • 12218
Re: Ghidra reverse engineering tool
« Reply #11 on: 01 / June / 2019, 18:25:50 »
I played with the versioning tool a bit more.

As expected, manually selecting and configuring the "correlators" seems like the better approach. This is done with the + icon on the toolbar, after starting a version tracking session.

You can run additional correlators later. Some benefit from having accepted matches.

Two good ones to start with seem to be "Exact Function Mnemonics Match" and (if you've already named symbols with e.g. with the stubs script) "Exact Symbol Name Match". After you've selected correlators, there's a wizard interface that lets you set options

"Exact Function Mnemonics Match" has an option "Function minimum size". The text is truncated in the UI on my system, but I think it's in bytes. This seems useful to reduce the number of duplicates for very simple functions.

There is also a "Limit source and destination address sets" option, which could be used to exclude non code regions, or areas that are copied to RAM. You can even use the selection in the disassembly views.

On a410 vs a540, running the two correlators above on address ranges 0xffc00000 to the "start of data" string, with minimum function length 40 takes a few seconds and finds ~10k matches.

Note the the "function" matches appear to only apply to things that are defined as functions, not labels on disassembled code (since it can't know the start/end otherwise). It could be useful to make stub load script try to create functions, but this can cause incorrect results when there's a tail call b ... at the end, and a few things in stubs aren't functions. There might included scripts which could help.

The obvious CHDK use case for the is porting. It's quite similar to what the sig finders do, but applies to the entire firmware. So if you know where something is referenced in a known firmware, finding it in the new one should be very quick.

The version tracking tool (and single program disassembly / decompiler) pay attention to which memory regions are marked as executable. So loading the parts of the ROM known to contain code separately will likely give better / faster results, though for version comparison, you can do this with the address range instead. A script to automatically set up the memory map could be quite useful, but the API documentation doesn't make it easy to discover how to do things like this.
Don't forget what the H stands for.

*

Offline reyalp

  • ******
  • 12218
Re: Ghidra reverse engineering tool
« Reply #12 on: 03 / November / 2019, 18:34:10 »
Ghidra 9.1 has been released https://ghidra-sre.org/releaseNotes_9.1_final.html

I haven't upgraded yet.

A few other general notes (based on 9.04):
Setting up the memory map well improves and speeds up analysis. In particular
* initialized data (copied to 0x1900 or 0x8000)
* split the ROM into code and data, with only the code parts set to executable. Generally code is the main ROM start to _ctypes, plus romstarter if you want it. This avoids analyzing copied code at the incorrect source location, as well as time spent analyzing data
* I'd really like a way to automatic this, but it's not obvious how to script memory map manipulation. Maybe it should be an "Importer"

After initial analysis, running "Auto analysis" with just "scalar operand references" selected can be helpful.

If you define function prototypes in the data type manager, you can apply them by dragging/dropping on to functions in the disassembly view. I haven't found a good way to share prototypes and type definitions between programs (theoretically, "Parse C source" should allow you to import)
Don't forget what the H stands for.

*

Offline reyalp

  • ******
  • 12218
Re: Ghidra reverse engineering tool
« Reply #13 on: 09 / December / 2019, 23:52:19 »
I added a much enhanced stubs importer to svn under tools/ghidra_scripts

It includes a module for loading stubs*.S and csv files, which I hope to reuse for additional scripts.

I've only tested it with 9.1.

Description:
This script prompts for a platform/sub directory, looks for any of
 stubs_entry.S, stubs_entry.err.S, stubs_min.S func_by_address.csv
and prompts for which ones to load.

It then attempts to create labels for the loaded function and variable definitions, and start disassembly on the functions if they aren't already disassembled. By default, it also attempts to copy EOL comments from stubs, and clean up incorrect data definitions related to thumb pointers. This can be controlled with options described below.

Variable definitions require that you have created a memory map with the appropriate address space.

It currently loads but does not do anything with DEF_CONST values

Some behavior can be controlled by an ini format configuration file, which is created automatically in the same directory as the script if not present. The default cfg includes a description of each option, and can be created by just running the script and cancelling.

By default, if multiple different names refer to the same address, the script creates multiple labels. This is convenient since you can jump to any of them, and see in the listing when multiple names refer to the same function.
The create_dupe_names option controls this.

There are also options to handle conflicts where one name refers to multiple addresses, both in the stubs files (generally from the csv, where tasks or eventprocs can appear defined with multiple addresses) and pre-existing labels in the Ghidra program (e.g from previous runs with different stubs)

The default is for stubs conflicts (same name with two different addresses) to be named with a suffix like
 _stubs_<address>.
The conflict_stubs option controls this.

For conflicts between the incoming stubs and the existing program, the default is to delete the one in the program, so if you had a stub wrong, running the script again with corrected stubs will update it.  The conflict_names option controls this.

By default, comments from stubs files that don't look like auto generated match type or score will be added to the program as pre-comments. Any existing comment of the same type will be overwritten. This can be controlled with the stubs_comments and stubs_comments_type options

The script also tries to fix up an issue I noticed on thumb2 firmware, where the analysis can end up creating data at the second byte of a thumb function if it happens to look like a string, do to interpreting a thumb pointer as a possible data address.
The clean_thumb_data option controls this.

Don't forget what the H stands for.


*

Offline reyalp

  • ******
  • 12218
Re: Ghidra reverse engineering tool
« Reply #14 on: 10 / December / 2019, 16:12:34 »
Some other random Ghidra notes:

If you run a script and don't like the results, "undo" rolls everything back.

The directory browser is a bit wonky (on Windows, at least). You have to select the directory you want, either by being in the parent directory and having the sub you want clicked, or being in the sub with nothing in the selected name area. It's easy to be in say sub/100b and have 100b selected and get a file error because there's no 100b/100b

On pre-digic 6 processors, the "Non returning functions - detected" analysis option seems to get a lot of false positives (ClearEventFlag in particulary), which breaks a lot of diassembly and is annoying to clean up. I'd suggest leaving it off.

Ghidra likes to assign switch labels to commonly called functions like DebugAssert, so in code you see some random switch label in the code view by default.

If you define memory regions for things like MMIO, you can click on one MMIO access, and see what refers to the MMIO, including indirect references. But d6 MMIO space is huge and leads to a lot of false positives for data references.

Having all your firmware dumps in one project is good because it allows you to use the verson tracking tool, and have multiple dumps open at the same time.

When you import a dump (a "program" in Ghidra terminology) you can put it in a folder and name it something other than primary.bin. Naming it something like a540-100b-primary.bin makes it easier to identify at a glance rather than having a whole bunch of windows titled "primary.bin"

There several ways you can open multiple firmwares:
* Open each in separate code browsers. Each one is entirely separate. I've sometimes seen warnings related to saving settings when one is closed, but it seems to be OK.
* Open one, and then open the other from the file menu. Each firmware becomes a tab in code browser. Filters in things like the strings window are shared. This is slightly annoying annoying because the "back" navigation operates on whichever tab you last interacted with. It also takes time to update the string view which switching between tabs. You can use the camera icon to create additional listing views for side by side comparison.
* The version tool automatically opens code views
* There is also a diff view within the code browser. Haven't played with this much, but it seems like it be useful for comparing individual functions.
Don't forget what the H stands for.

*

Offline reyalp

  • ******
  • 12218
Re: Ghidra reverse engineering tool
« Reply #15 on: 10 / December / 2019, 18:37:06 »
Version tracking workflow for porting, loosely based on the "workflow" item in version tracking help.

Pick a similar, already ported cam and the camera to be ported.
Load both dumps as in https://chdk.setepontos.com/index.php?topic=13718.msg139705#msg139705
do initial analysis.

For the unported camera, build stubs. If copying from an existing port, be remove or nullsub out functions in stubs_entry_2 (they can get picked up in funcs_by_*.csv)

In each dump, run the load stubs script. For the unported camera, exclude stubs_min.S and stubs_entry_2.s. For the ported camera, include all.

Close source tool(s) and open the version tool
Start a new session, picking the ported cam as a source, new cam as destination

Click the + in the toolbar, select the "exact symbol name" correlator and run.

In the list view, click the lightbulb to filter
Uncheck implied matches, and uncheck all sources except Imported and User Defined
Select all, accept. Good accepted matches theoretically help identify other matches. There might be a few bad stubs, but the vast majority should be good.

Reset filters to default

Click the + again, select "exact function mnemonic" match. You can set minimum length to something longer (I used 20).

Once that completes, you can use the search bar on functions not found by the sig finder. If there's a match, it will be shown in the list. This includes not just the exact mnemonic matches, but also implied matches. (screenshot)

You can the source viewer of the ported firmware to jump to missing functions / variables and check whether there is a match.

If you have set up the memory map to include data, this will also find  variables as implied matches. (screenshot 2)

The mnemonic matches should also be fairly safe to accept in bulk, especially with a longer minimum length, although there are some larger functions duplicated in multiple locations.

Don't forget what the H stands for.

*

Offline koshy

  • *****
  • 1003
Re: Ghidra reverse engineering tool
« Reply #16 on: 11 / December / 2019, 17:19:16 »
@reyalp Thanks for the continued posts to this thread. Thus far it's been more of an interesting thing to follow in reading but with the stubs importer script and all I figured I'd play some with it.

Eventually it turned out that for me the main caveat was probably negligence...

* With the project selected, choose file, import file, select primary.bin. I'm using sx710 here. Creating a folder in the project tree is a good idea if loading multiple firmwares. You can also name the "program" something other than PRIMARY.BIN
* Format - raw binary
* Language ARM v7t 32 bit little endian default (Digic 6 is v7, earlier should be v5)
I read "ARM v5 32 bit little endian" out of the above for older cameras. That sure does something but the stubs importer encounters a null pointer and stuff generally won't work. "ARM v5t 32 bit little endian" is what I should have read out of it I guess. With that it all came together nicely.

I had pondered the idea to look at codegen some more sometime when I looked into porting S80 to an unencountered FW last year using legacy tools. I had even decided on a camera for that venture. Got two of them, too but never got around to looking at it as the last 12 months were let's say bumpy.
Who knows maybe this is a start eventually, maybe Christmas gets in the way.  The camera I'm interested in is Ixus 990, it's DRYOS R31 and I figured I'd use Ixus 100 which is also R31 as a base to start from since that port seems to use codegen to a good extent.

Anyway, right now this is just to say I'm reading this thread and trying to put it to some use, too ;)
« Last Edit: 11 / December / 2019, 17:23:29 by koshy »
Koshy had a little ELPH which wasn't white as snow but everywhere that Koshy went the ELPH was sure to go. (actually an SD, but that detail ruins the rhyme...)

*

Offline reyalp

  • ******
  • 12218
Re: Ghidra reverse engineering tool
« Reply #17 on: 11 / December / 2019, 18:22:27 »
Quote
* Language ARM v7t 32 bit little endian default (Digic 6 is v7, earlier should be v5)
I read "ARM v5 32 bit little endian" out of the above for older cameras. That sure does something but the stubs importer encounters a null pointer and stuff generally won't work. "ARM v5t 32 bit little endian" is what I should have read out of it I guess. With that it all came together nicely.
Yes, I always picked the "t" variant but wasn't sure if it mattered. Updated the post.

I'd like to distill these into a wiki page at some point, but at the moment I'm still figuring out what works and taking notes as I go.
« Last Edit: 11 / December / 2019, 19:02:15 by reyalp »
Don't forget what the H stands for.


 

Related Topics