90% of this post was written before the AHA moment, but I think I have figured this one out...
g7x dump
fc13bef6: 4770 bx lr
crash calling above with thumb bit set
con 1> =return call_func_ptr(0xfc13bef7)
Module log entry
Tick ,Op,Address ,Name (2016:02:13 11:51:59)
12130,LD,005190d0,lua.flt
Disassemble module elf
arm-none-eabi-objdump.exe -d -x --adjust-vma=0x5190d0 lua.elf > lua-5190d0.dumpobj
There's a comment I wrote in
http://chdk.wikia.com/wiki/Debugging#Debugging_modules about needing to adjust the start by 36, but with current tools and module code, that doesn't seem to apply. The module elf _text start is (intentionally? coincidentally?) a 36 bytes offset from the VMA address
I'm not clear when this changed (EABI maybe?), but I recall running into it before and adding the hand-wavy addendum about start to that comment.
The dump seems to confirm
005190f4 l d .text 00000000 .text
Assuming this is correct, should clarified in the wiki but I don't understand it well enough to clarify.
The crash appears to be in the lua_pushnumber part of luaCB_call_func_ptr
lua_pushnumber( L, call_func_ptr(fptr, argbuf, n_args) );
Romlog
Exception!! Vector 0x10
Occured Time 2016:02:13 11:53:16
Task ID: 17760295
Task name: PhySw
Exc Registers:
0xFC13BEF7 ; r0 - > should be lua_state, but it's the function called?
0x00000004 ; r1
0x00000000 ; r2
0xCB486D47 ; r3
0xFC13BEF7 ; r4 -> should be lua_state too
0x00000000 ; r5
0x005436C8 ; r6
0x00000000 ; r7
0xFC13BEF7 ; r8
0x00000000 ; r9
0x00544FE8 ; r10
0x00544F20 ; r11
0xFC13BEF7 ; r12
0x006421E0 ; SP
0x0051B6CD ; LR luaCB_call_func_ptr ret from lua_pushnumber
0x0051E11A ; PC lua_pushnumber 51e11a: 6019 str r1, [r3, #0]
0x20000073
StackDump:
0x003734C0 ; call_func_ptr (as an ARM adr?)
0xFC13BEF7 ; called func
0x00000000
0x0051B6CD ; luaCB_call_func_ptr ret from lua_pushnumber (same as LR), above matches push r3-r5, lr
0x00543468
0x00543780
0x005436C8
0xFFFFFFFF
Relevant pushnumber code
0051e114 <lua_pushnumber>:
51e114: b538 push {r3, r4, r5, lr}
51e116: 4604 mov r4, r0
51e118: 6883 ldr r3, [r0, #8]
51e11a: 6019 str r1, [r3, #0] ; < data abort here
Relevant luaCB_call_func_ptr
51b6ba: 4640 mov r0, r8
51b6bc: 4639 mov r1, r7
51b6be: 462a mov r2, r5
51b6c0: 4b0b ldr r3, [pc, #44] ; (51b6f0 <luaCB_call_func_ptr+0x88>)
51b6c2: 4798 blx r3 ; < call_func_ptr call
51b6c4: 4601 mov r1, r0 ; < return value goes in r1
51b6c6: 4620 mov r0, r4 ; < should be lua state L going in R0 for push
51b6c8: f7ff fffe bl 51e114 <lua_pushnumber>
51b6c8: R_ARM_THM_CALL lua_pushnumber
51b6cc: 4638 mov r0, r7 ; < LR in romlog
This all makes it look like r4 wasn't restored correctly by call_func_ptr, but I don't see how that could happen???
The disassembly of main.bin.dump call_func_ptr seems to match the source.
One oddity is that the call_func_ptr address in the stack trace is an ARM address. Since this was presumably the R3 used the blx r3, it should be thumb. Is the linker assuming call_func_ptr is arm already?
Disassembling the call_func_ptr interwork code (BXPC, plus padding NOP) as ARM
$ ./capdis.exe -nofwdata ../platform/g7x/sub/100d/main.bin 0x36a354 -s=0x3734c0 -c=10
WARNING gaonisoy string not found, assuming code start offset 0
andeq r4, r0, r8, ror r7
push {r4, r5, lr}
mov ip, r0
mov r4, sp
add r5, r1, r2, lsl #2
In the register dump, r7 is 0 (so the ROR is NOP, and R8 has the same value as R0). I'm not totally sure what the state of the condition code will be, but eq seems likely.
To test, I rebuilt with the .code16, bx pc commented out. It appears to work, which means that with the toolchain I am using at least (arm-none-eabi-gcc.exe (GNU Tools for ARM Embedded Processors) 4.9.3 20141119 (release) [ARM/embedded-4_9-branch revision 218278]), the bx pc is not needed. Attempting to run call_func_ptr as thumb would definitely crash, so it's doing the right thing by itself.
If this is toolchain dependent, we should probably make a separate wrapper for call_func_ptr.