supplierdeeply

gcc. windows dev kit, binary size & the whole ordeal (discussion here)

  • 48 Replies
  • 21802 Views
*

Offline philmoz

  • *****
  • 3070
    • Photos
Advertisements
The wrappers are supposed to avoid this, by only calling ROM code from arm, never directly from thumb... that's why we have wrappers in the first place! But apparently GCC is assuming that in the 946e-s case, all code will return with BX LR and so optimizing away part of our wrapper.
Can we tell gcc to not do this using some volatile annotations or so?

If the rom code is not thumb safe, why doesn't we change the wrapper
from
Code: [Select]
#define NSTUB(name, addr)\
    .globl _##name ;\
    .weak _##name ;\
    _##name: ;\
        ldr  pc, = ## addr

to

Code: [Select]
#define NSTUB(name, addr)\
    .globl _##name ;\
    .weak _##name ;\
    _##name: ;\
        push {lr} ;\
        blx addr ;\
        pop {lr} ;\
        bx lr

The stubs code is compiled as ARM not Thumb, wouldn't 'blx addr' switch to Thumb before jumping to the firmware address? Since the firmware is ARM wouldn't this crash?

The other problem is you can't guarantee that 'addr' is within the PC relative branch offset range - which is why the current stubs load a 32 bit value into PC.

If you know a way to mix ARM and Thumb code in the same .S file then it should be possible to create a better stub system. I played around with this; but couldn't get it to work.

Phil.
CHDK ports:
  sx30is (1.00c, 1.00h, 1.00l, 1.00n & 1.00p)
  g12 (1.00c, 1.00e, 1.00f & 1.00g)
  sx130is (1.01d & 1.01f)
  ixus310hs (1.00a & 1.01a)
  sx40hs (1.00d, 1.00g & 1.00i)
  g1x (1.00e, 1.00f & 1.00g)

*

Offline reyalp

  • ******
  • 11443
The stubs code is compiled as ARM not Thumb, wouldn't 'blx addr' switch to Thumb before jumping to the firmware address? Since the firmware is ARM wouldn't this crash?
Only if it's a thumb address (bit0=1), which it shouldn't be.
Quote
The other problem is you can't guarantee that 'addr' is within the PC relative branch offset range - which is why the current stubs load a 32 bit value into PC.
BLX <constant> does not necessarily do what you want even if it is theoretically within a possible range. In some cases it doesn't even compile. You can LDR <reg>,=addr; BLX REG, but then you might as well just do LDR PC, =addr, since you are pushing and popping LR anyway.

Except that pushing LR isn't going to work: If your function expects more than 4 arguments, you'll have suddenly have LR sitting where one of them is expected. That won't end well.
Don't forget what the H stands for.

*

Offline matc

  • *
  • 11
Quote
The other problem is you can't guarantee that 'addr' is within the PC relative branch offset range - which is why the current stubs load a 32 bit value into PC.
BLX <constant> does not necessarily do what you want even if it is theoretically within a possible range. In some cases it doesn't even compile.
I tested with my binutils and if the jump is outside the 32 MB range, it generate an alternative code

Code: [Select]
000b9d10 <_AllocateMemory2>:
   b9d10:       e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
   b9d14:       eb004d27        bl      cd1b8 <__*ABS*0xcfc01ab0_veneer>
   b9d18:       e49de004        pop     {lr}            ; (ldr lr, [sp], #4)
   b9d1c:       e12fff1e          bx      lr

000cd1b8 <__*ABS*0xcfc01ab0_veneer>:
   cd1b8:       e51ff004        ldr     pc, [pc, #-4]   ; cd1bc <__*ABS*0xcfc01ab0_veneer+0x4>
   cd1bc:       cfc01ab0        .word   0xcfc01ab0
But it may not supported by older version of binutils.
For my camera A710, the jump isn't outside the range because firmware is at high addr 0xffe.... and chdk low addr
Code: [Select]
000ba400 <_readdir>:
   ba400:       e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
   ba404:       faf902b2        blx     ffefaed4 <*ABS*0xffefaed4>
   ba408:       e49de004        pop     {lr}            ; (ldr lr, [sp], #4)
   ba40c:       e12fff1e          bx      lr
Quote
You can LDR <reg>,=addr; BLX REG, but then you might as well just do LDR PC, =addr, since you are pushing and popping LR anyway.
That should need an extra mov to LR
Quote
Except that pushing LR isn't going to work: If your function expects more than 4 arguments, you'll have suddenly have LR sitting where one of them is expected. That won't end well.


Yes I forgot that. But after a quick look there no (or not too much) function with more than 4 arguments. So if could be acceptable to do special case for them.

I don't know enough the chdk code source, but if all the rom function are wrapped in wrappers.c, building it as thumb may solve the problem :
there is no bx for label. So gcc will use bl or blx :

Code: [Select]
000b5db4 <readdir>:
   b5db4:       b508            push    {r3, lr}
   b5db6:       f003 ffab       bl      b9d10 <_readdir>
   b5dba:       bd08            pop     {r3, pc}

That's strange bl is used instead of blx (I need to investigate why) and r3 is saved.
But it may be a good way to solve the issue and let's gcc handle argument on the stack case.

*

Offline matc

  • *
  • 11
Code: [Select]
That's strange bl is used instead of blx (I need to investigate why) and r3 is saved.
R3 is saved to align stack on 16 byte boundary. I build in eabi mode, but I am not sure this is the correct option (eabi specify interwork).

Is there some pointer of the abi and fpu format used by the rom. I am surprised that the options are not set in the makefile.

in order to generate blx in need to modify the stub to
Code: [Select]
#define NSTUB(name, addr)\
    .globl _##name ;\
        .align  2; \
    .type   _##name , %function; \
    .weak _##name ;\
    _##name: ;\
        ldr  pc, = ## addr


*

Offline reyalp

  • ******
  • 11443
I tested with my binutils and if the jump is outside the 32 MB range, it generate an alternative code
We have already determined that it does *different* unpredictable things under different toolchains/conditions. Unless you can find a clear definition of what it is supposed to do under which conditions, what happens on your specific toolchain setup isn't interesting.

Generally speaking, BL(X) <address> doesn't make much sense, since it's a PC relative offset and the assembler doesn't know the final address of the thing it is assembling. The assembler might choose to treat it as an offset, but in that case, it's not at all clear how any rounding/truncating/sign extension would be done turning it into a 24 bit value. It might also treat it as an address for the linker to resolve.
Quote
That should need an extra mov to LR
But it would at least be valid code ;)

Quote
Quote
Except that pushing LR isn't going to work: If your function expects more than 4 arguments, you'll have suddenly have LR sitting where one of them is expected. That won't end well.

Yes I forgot that. But after a quick look there no (or not too much) function with more than 4 arguments. So if could be acceptable to do special case for them.
That would not be acceptable IMO.

Quote
I don't know enough the chdk code source, but if all the rom function are wrapped in wrappers.c, building it as thumb may solve the problem :
The whole *point* of wrappers it to provide glue between arm and thumb. Some extra stuff is in there because it's a convenient place to paper over camera differences, but that's why it exists.

It might be possible to combine wrappers and stubs, so both the long jump and arm/thumb transition were handled by one bit of code.
Don't forget what the H stands for.

*

Offline matc

  • *
  • 11
Ok after some analysis, it seems the vxworks roms was build with apcs or atpcs without any thumb interwork.

The best way to make thumb code call rom is to use something similar to "-mcaller-super-interworking" option.

For less than 4 arguments this is easy :
Code: [Select]
#define ROM_CALL(name, addr)\
    .globl name ;\
    .align      2; \
    .type  name , %function; \
    name: ;\
    tst lr, #1;\
    pushne {lr}; \
    movne lr, pc; \
    ldr  pc, = addr; \
    pop {pc}

#define NSTUB(name, addr)\
            .weak _##name ;\
            ROM_CALL(_##name, addr)

#define NHSTUB(name, addr)\
            ROM_CALL(_##name, addr)

The remaining function are
Code: [Select]
_CreateTask

_vsprintf

_iosDrvInstall

_ExecuteEventProcedure

There are all called with bl from 32 bit code, so in theory we have nothing to do (the code above handle it with tst lr, #1).

Allowing them to return directly to thumb is more complex to handle : we have variadic functions.

May be adding a script that check that these functions aren't directly returning to thumb is an acceptable solution ?
Code: [Select]
grep -e _CreateTask -e _vsprintf -e _iosDrvInstall -e _ExecuteEventProcedure core/main.dump | grep -v ":$" | grep -v bl

PS : dryos rom seems to use bx/blx in its code. Is it  interwork safe ?

*

Offline reyalp

  • ******
  • 11443
PS : dryos rom seems to use bx/blx in its code.
Mostly, but not everywhere.
Don't forget what the H stands for.

*

Offline srsa_4c

  • ******
  • 3665
issue with floating point (solved, I guess)
« Reply #47 on: 25 / May / 2012, 21:47:05 »
My situation is off-topic, but this discussion seemed to be the closest.

I'm working on a non-standard port (S1 IS). Its CPU is an older ARM revision (ARM7TDMI, according to strings in the firmware). As I couldn't get some features to work, I found out that pow() and possibly other floating point functions return incorrect result (zero in this case). Calling _pow() or __pow() from ARM code works well (the returned value is good), pow() from thumb code makes the below routine return 0, calling __pow() from thumb code resulted in undefined instruction exception in another task (so something went wrong).

Does somebody have an idea what I should be looking at? Floating point support in the CPU or arm-thumb interwork?

Haven't tested pow() directly, I used this function with some usual values:
Code: [Select]
short shooting_get_aperture_from_av96(short av96)
{
    if (av96)
        return (short)((pow(sqrt2, ((double)av96)/96.0))*100.0);
    return -1;
}

pow() is defined in lib/math/wrapper.c , __pow() is found in firmware (edit: but it's STUB'd of course)
For gcc, I use -mtune=arm7tdmi but it produces exactly the same code as with the default CHDK option.

edit:
calling _pow() from thumb works, and produces good result

edit2: My guess is that this firmware is using a different representation for floats (it was compiled in 2004 with gcc). I wasn't able to recognize these (floating point) routines when I searched for them, they differ from the usual DIGIC II firmware code completely.
« Last Edit: 26 / May / 2012, 17:27:25 by srsa_4c »


*

Offline srsa_4c

  • ******
  • 3665
Re: gcc. windows dev kit, binary size & the whole ordeal (discussion here)
« Reply #48 on: 22 / February / 2013, 18:34:11 »
While figuring out the reason why some parts of CHDK do not work on the S1IS, I have found some interesting things.
It has an ARMv4T CPU. The firmware is not thumb-safe (this actually applies to all VxWorks cameras). Most parts of CHDK can deal with this, except (the list is probably not complete):

- call_func_ptr in lib/armutil/callfunc.S uses BLX: the routine works when BLX is substituted with supported instructions
- calling CreateTask with a thumb task as argument: the task needs an ARM wrapper to function
- function pointers from the firmware: GCC (or the linker?) generates the following instructions to call them:

b00a2:   692b         ldr   r3, [r5, #16]
b00a4:   f000 fa6c    bl   b0580
...
b0580:   4718         bx   r3


This is not working on ARMv4T, since the CPU remains in ARM mode when returning from the call (the only way to switch back to thumb is to use BX). Solution (forcing gcc/ld to use proper interworking code would be better): all parts of CHDK which use function pointers provided by the firmware need to be compiled as ARM. The only place I have found so far which is affected is the ptp code.

Such function pointers may cause trouble on VxWorks-based firmwares, if the functions they point to end with
MOV PC, LR
This instruction doesn't switch back to thumb, not even on ARMv5T. Fortunately the ptp code seems not to be affected.

 

Related Topics