gcc. windows dev kit, binary size & the whole ordeal (discussion here) - page 4 - CHDK Releases - CHDK Forum

gcc. windows dev kit, binary size & the whole ordeal (discussion here)

  • 48 Replies
  • 24466 Views
*

Offline chr

  • ***
  • 138
  • IXUS 82 IS
Re: gcc. windows dev kit, binary size & the whole ordeal (discussion here)
« Reply #30 on: 17 / November / 2008, 17:46:14 »
Advertisements
We have more then one error and many symtoms:

1. binutils 2.19
   cam doesn't boot at all. Also, looking in core/main.dump, it does not start with _start ! However, I disassembled the final binary and it seems to _start with our bootstrap loader... but it is not working! CHDK's custom link script failed?!

So I downgraded to binutils 2.18.


2. gcc-4.3.2

Here comes violence!

I made gcc with this instructions User:Geekmug/Compiling CHDK under Windows - CHDK Wiki under Linux.

Cam boots, but crashes in REC mode (It crashes while fetching battery data for display. If I disable display battery in chdk, no crash at this point ....).

Then I remade gcc with gcc/config/arm/t-arm-elf hack and Hackis flags:

../configure --target=arm-elf --prefix=~/arm-tools --enable-languages=c --disable-sanity-checks --disable-shared --disable-newlib --disable-libssp

Note, no cpu flag. Result:

libgcc is [FPA float format]

check with:
arm-tools/bin/arm-elf-objdump -x ./arm-elf/thumb/libgcc/libgcc.a | head

Then I build CHDK with:

CFLAGS+=-mcpu=arm946e-s

Error: libgcc uses FPA and core/main.elf uses VFP

(check objdump -x core/main.elf)

Next try:

CFLAGS+=-mcpu=arm946e-s -mfpu=fpa

compiles, cam boots, but crashes in get_batt_average()

Next:
CFLAGS+=-mtune=arm946e-s

compiles, cam boots and works FINE!!!


I diffed the generated assembly ... see next post ... coming soon

 :lol  >:(  :lol  >:(


*

Offline chr

  • ***
  • 138
  • IXUS 82 IS
Re: gcc. windows dev kit, binary size & the whole ordeal (discussion here)
« Reply #31 on: 17 / November / 2008, 18:03:39 »
defective code from gcc -mcpu=arm946e-s -mfpu=fpa

Code: (asm) [Select]
000c8904 <get_batt_average>:
   c8904: b5f0      push {r4, r5, r6, r7, lr}
   c8906: 4f11      ldr r7, [pc, #68] (c894c <get_batt_average+0x48>)
   c8908: 4e11      ldr r6, [pc, #68] (c8950 <get_batt_average+0x4c>)
   c890a: 683c      ldr r4, [r7, #0]
   c890c: 4d11      ldr r5, [pc, #68] (c8954 <get_batt_average+0x50>)
   c890e: 0064      lsls r4, r4, #1
   c8910: 5b62      ldrh r2, [r4, r5]
   c8912: 6833      ldr r3, [r6, #0]
   c8914: 1a9b      subs r3, r3, r2
   c8916: 6033      str r3, [r6, #0]
   c8918: f01f fec0 bl e869c <__stat_get_vbatt_from_thumb>
   c891c: 6839      ldr r1, [r7, #0]
   c891e: 5360      strh r0, [r4, r5]
   c8920: 004b      lsls r3, r1, #1
according to ROMLOG we crashed around here on illegal instruction
   c8922: 5b5a      ldrh r2, [r3, r5]
   c8924: 6833      ldr r3, [r6, #0]
   c8926: 3101      adds r1, #1
   c8928: 18d0      adds r0, r2, r3
   c892a: 4a0b      ldr r2, [pc, #44] (c8958 <get_batt_average+0x54>)
   c892c: 6030      str r0, [r6, #0]
   c892e: 6813      ldr r3, [r2, #0]
   c8930: 6039      str r1, [r7, #0]
   c8932: 4299      cmp r1, r3
   c8934: d900      bls.n c8938 <get_batt_average+0x34>
   c8936: 6011      str r1, [r2, #0]
   c8938: 2963      cmp r1, #99
   c893a: d902      bls.n c8942 <get_batt_average+0x3e>
   c893c: 4a03      ldr r2, [pc, #12] (c894c <get_batt_average+0x48>)
   c893e: 2300      movs r3, #0
   c8940: 6013      str r3, [r2, #0]
   c8942: 4b05      ldr r3, [pc, #20] (c8958 <get_batt_average+0x54>)
   c8944: 6819      ldr r1, [r3, #0]
   c8946: f01e fb49 bl e6fdc <__aeabi_uidiv>
   c894a: bdf0      pop {r4, r5, r6, r7, pc}
   c894c: 000f8e4c .word 0x000f8e4c

000e869c <__stat_get_vbatt_from_thumb>:
   e869c: 4778      bx pc
   e869e: 46c0      nop (mov r8, r8)

000e86a0 <__stat_get_vbatt_change_to_arm>:
   e86a0: eaffad5b b d3c14 <stat_get_vbatt>

000d3c14 <stat_get_vbatt>:
   d3c14: ea000c6f b d6dd8 <_VbattGet>

000d6dd8 <_VbattGet>:
   d6dd8: e59ff194 ldr pc, [pc, #404] ; d6f74 <_write+0xe4>

   d6f74: ff8207e4 .word 0xff8207e4
So far ok? Guess what: the call to 0xff8207e4 ends with:

mov   pc, lr

This does not switch back to thumb.

[hl]==========[/hl]  ??? :haha

code from gcc -mtune=arm946e-s:
(gcc selected -mfpu=fpa)

Code: (asm) [Select]
000c8b7c <get_batt_average>:
   c8b7c: b5f0      push {r4, r5, r6, r7, lr}
   c8b7e: 4f12      ldr r7, [pc, #72] (c8bc8 <get_batt_average+0x4c>)
   c8b80: 4e12      ldr r6, [pc, #72] (c8bcc <get_batt_average+0x50>)
   c8b82: 683c      ldr r4, [r7, #0]
   c8b84: 4d12      ldr r5, [pc, #72] (c8bd0 <get_batt_average+0x54>)
   c8b86: 0064      lsls r4, r4, #1
   c8b88: 5b62      ldrh r2, [r4, r5]
   c8b8a: 6833      ldr r3, [r6, #0]
   c8b8c: 1a9b      subs r3, r3, r2
   c8b8e: 6033      str r3, [r6, #0]
   c8b90: f021 f854 bl e9c3c <__stat_get_vbatt_from_thumb>
   c8b94: 6839      ldr r1, [r7, #0]
   c8b96: 5360      strh r0, [r4, r5]
   c8b98: 004b      lsls r3, r1, #1
   c8b9a: 5b5a      ldrh r2, [r3, r5]
   c8b9c: 6833      ldr r3, [r6, #0]
   c8b9e: 3101      adds r1, #1
   c8ba0: 18d0      adds r0, r2, r3
   c8ba2: 4a0c      ldr r2, [pc, #48] (c8bd4 <get_batt_average+0x58>)
   c8ba4: 6030      str r0, [r6, #0]
   c8ba6: 6813      ldr r3, [r2, #0]
   c8ba8: 6039      str r1, [r7, #0]
   c8baa: 4299      cmp r1, r3
   c8bac: d900      bls.n c8bb0 <get_batt_average+0x34>
   c8bae: 6011      str r1, [r2, #0]
   c8bb0: 2963      cmp r1, #99
   c8bb2: d902      bls.n c8bba <get_batt_average+0x3e>
   c8bb4: 4a04      ldr r2, [pc, #16] (c8bc8 <get_batt_average+0x4c>)
   c8bb6: 2300      movs r3, #0
   c8bb8: 6013      str r3, [r2, #0]
   c8bba: 4b06      ldr r3, [pc, #24] (c8bd4 <get_batt_average+0x58>)
   c8bbc: 6819      ldr r1, [r3, #0]
   c8bbe: f01f fcdd bl e857c <__aeabi_uidiv>
see that? funky.
   c8bc2: bcf0      pop {r4, r5, r6, r7}
   c8bc4: bc02      pop {r1}
   c8bc6: 4708      bx r1
   c8bc8: 000fa3f8 .word 0x000fa3f8


000e9c3c <__stat_get_vbatt_from_thumb>:
   e9c3c: 4778      bx pc
   e9c3e: 46c0      nop (mov r8, r8)

000e9c40 <__stat_get_vbatt_change_to_arm>:
   e9c40: eaffaa34 b d4518 <stat_get_vbatt>

Aha, some extra glue:
000d4518 <stat_get_vbatt>:
   d4518: e52de004 push {lr} ; (str lr, [sp, #-4]!)
   d451c: eb000d05 bl d7938 <_VbattGet>
   d4520: e49de004 pop {lr} ; (ldr lr, [sp], #4)
   d4524: e12fff1e bx lr

000d7938 <_VbattGet>:
   d7938: e59ff194 ldr pc, [pc, #404] ; d7ad4 <_write+0xe4>

   d7ad4: ff8207e4 .word 0xff8207e4

So this code works!

It doesn't matter if we use FPA or VFP as long as libgcc and chdk are compiled the same way .... we don't call any functions in the ROM giving or receiving floats?!

Finally, compiling CHDK without any mtune or mcpu flags works as well. However, the final assembly differs a little bit ... around +0x24 bytes.

But then, we just found another bug: the benchmark test does not work in latest version ...  >:(
« Last Edit: 17 / November / 2008, 18:05:50 by chr »

*

Offline chr

  • ***
  • 138
  • IXUS 82 IS
Re: gcc. windows dev kit, binary size & the whole ordeal (discussion here)
« Reply #32 on: 18 / November / 2008, 12:47:17 »
Hiho!

To complete the confusion, this is how "gcc3 -march=armv5te" makes it:

Code: (asm) [Select]
000d4850 <stat_get_vbatt>:
   d4850: e1a0c00d mov ip, sp
   d4854: e92dd800 stmdb sp!, {fp, ip, lr, pc}
   d4858: e24cb004 sub fp, ip, #4 ; 0x4
   d485c: eb000f74 bl d8634 <_VbattGet>
   d4860: e89d6800 ldmia sp, {fp, sp, lr}
   d4864: e12fff1e bx lr

So, it also restores the sp ! Seems to be some more unnecessary overhead.

Conclusions:

* In all cases, call from thumb to ARM is some braindead overhead.

* march / mcpu settings have some side effects on that ... so are there some reliable options to control this? Maybe some __attributes__ on functions?!

* the 'defective' code with -mcpu=arm946e-s is correct (!) because the compiler assumes that functions do not return with
  mov   pc, lr ... but many ROM functions ends with that (looks like some library function). In case of VbattGet it ends with a shorthand of GOSUB+RETURN = GOTO ... so it's almost impossible to see which functions are affected.

Possible fix with code: the "ldr     pc, [pc, #404]" to call a ROM function comes from stubs_asm.h ... we may manually add the right glue around.

ps: just a reminder: bx, blx, ldr pc, <something>, ldm switches cpu state whereas b, bl, mov pc, <something> doesn't !


« Last Edit: 18 / November / 2008, 13:00:02 by chr »

*

Offline reyalp

  • ******
  • 12067
Re: gcc. windows dev kit, binary size & the whole ordeal (discussion here)
« Reply #33 on: 23 / November / 2008, 02:07:47 »

* march / mcpu settings have some side effects on that ... so are there some reliable options to control this? Maybe some __attributes__ on functions?!

* the 'defective' code with -mcpu=arm946e-s is correct (!) because the compiler assumes that functions do not return with
  mov   pc, lr ... but many ROM functions ends with that (looks like some library function). In case of VbattGet it ends with a shorthand of GOSUB+RETURN = GOTO ... so it's almost impossible to see which functions are affected.

Possible fix with code: the "ldr     pc, [pc, #404]" to call a ROM function comes from stubs_asm.h ... we may manually add the right glue around.

ps: just a reminder: bx, blx, ldr pc, <something>, ldm switches cpu state whereas b, bl, mov pc, <something> doesn't !
The wrappers are supposed to avoid this, by only calling ROM code from arm, never directly from thumb... that's why we have wrappers in the first place! But apparently GCC is assuming that in the 946e-s case, all code will return with BX LR and so optimizing away part of our wrapper.
Don't forget what the H stands for.


*

Offline whim

  • ******
  • 2013
  • A495/590/620/630 ixus70/115/220/230/300/870 S95
Re: gcc. windows dev kit, binary size & the whole ordeal (discussion here)
« Reply #34 on: 10 / December / 2008, 02:41:17 »
@reyalp

while working on GCC 4.3.2 support for CHDK-Shell (WIP, see here: Windows GUI for trunk building [currently v. 1.90]) i played a bit with the Makefile infoline, and i think it might be usefull to include this in trunk in the future (suggested addition in bold)

Quote
# at lines 55/56 in Makefile @ trunk 625
infoline:
   @echo "**** GCC $(GCC_VERSION) : BUILDING CHDK-$(VER), #$(BUILD_NUMBER) FOR $(PLATFORM)-$(PLATFORMSUB)"

... no rocket science, but should show the compiler version used on screen and in log files  :D

wim

BTW -- isn't this thread a bit misplaced here  ;)
« Last Edit: 10 / December / 2008, 03:04:13 by whim »

*

Offline whim

  • ******
  • 2013
  • A495/590/620/630 ixus70/115/220/230/300/870 S95
Re: gcc. windows dev kit, binary size & the whole ordeal (discussion here)
« Reply #35 on: 01 / February / 2009, 18:47:35 »
Hi !

This is just to let you know that i got  a working GCC-4.3.3-for-CHDK compiled under MinGW/MSYS ...
according to RedHat's server (ftp://sources.redhat.com/pub/gcc/releases) it was released on Jan 26  2009

I used the same method as for the 4.3.2 devkit (which is here: http://drop.io/chdkdev) i.e.
the method as described on www.yagarto.de - with tweaks derived from Geekmug's guide to build under cygwin to adapt the configure options.

Note that all parts of the devkit that target the host platform were simply taken from the 'original'
gcc_env_for_hdk_346.rar, so the Win32 gcc is still at version 3.2.3

components used: gcc-(core/g++)4.3.3 - binutils-2.18 - newlib 1.17.0 - gmp-4.2.4 - mpfr-2.3.2
(to see the config options used, just do an: 'arm-elf-gcc -v')

some DISKBOOT.BIN compilation size results
(don't know about autobuild, but 'mine' use vanilla trunk settings: opt_md_debug & opt_fi2 OFF, rest ON)


    camera       firm    trunk    gcc346    gcc431*    gcc432    gcc433

ixus70_sd1000    101B     701    264,564   254,444*   254,432   254,300

     A590        101B     701    273,729   264,041*   264,025   263,889


* = from autobuild server


I'll put it on the CHDK-Shell drop.io for the moment, here: http://drop.io/gcc_for_chdkshell
will be integrated in the next CHDK-Shell 'full' release as well.


wim


 




« Last Edit: 01 / February / 2009, 19:11:50 by whim »

*

Offline PhyrePhoX

  • *****
  • 2254
  • make RAW not WAR
    • PhyreWorX
Re: gcc. windows dev kit, binary size & the whole ordeal (discussion here)
« Reply #36 on: 02 / February / 2009, 17:13:16 »
thanks whim :)
will prolly test your next full release :)

Re: gcc. windows dev kit, binary size & the whole ordeal (discussion here)
« Reply #37 on: 15 / February / 2009, 14:09:32 »
Based on a change in a recent revision of the trunk adding a "blx" instruction to the s5is build, I noticed that the makefile.inc was using -mtune instead of -mcpu. I've updated the build instructions for GCC based on feedback in this thread (and the other). As such, I apologize for having mistakenly specified "--with-cpu=arm9" because I guessed based on the assumption that ARM named things sanely.. Nevertheless, you don't *have* to specify the CPU when building GCC (it just means that you have a slightly more bloated version of GCC) because it will get picked with -mpu and such.

Nevertheless, the trunk code needs to change be changed otherwise there will be linking problems about VFP instructions and such (as noted by chr). My current setup is to build GCC without a specific CPU target, meaning the libgcc.a ends up with VFP instructions. As chr showed, with GCC v4.x, -march=armv5te and -mcpu=arm946e-s. As such, the build of CHDK needs to only use "-mfpu=vfp". We could put "-mtune=arm946e-s" but I'm not sure it really makes much difference, but it can't do any harm. This produces working code and I think it's the safest strategy to future-proof things.

/edit: It seems I need to retract my statement about -mcpu=vfp, I upgraded from gcc 4.3.1 to 4.3.3 and now libgcc.a seems to be using FPA instead of VFP. Which also means, I agree with the trunk makefile.inc saying only "-mtune=arm946e-s".

BTW, there are no __attribute__s that can be used to fix this problem. The discussion of having a "arm"/"thumb" attribute was quickly brought to a stop on gcc-dev. As far as GCC is concerned, _VbattGet should following the same ABI, so things go wrong. The only solution would be to change the stub macros to generate proper glue.

-Scott
« Last Edit: 16 / February / 2009, 02:54:07 by geekmug »


Re: gcc. windows dev kit, binary size & the whole ordeal (discussion here)
« Reply #38 on: 07 / April / 2009, 14:51:43 »
The wrappers are supposed to avoid this, by only calling ROM code from arm, never directly from thumb... that's why we have wrappers in the first place! But apparently GCC is assuming that in the 946e-s case, all code will return with BX LR and so optimizing away part of our wrapper.
Can we tell gcc to not do this using some volatile annotations or so?
openSUSE, Canon IXUS 980 IS

*

Offline matc

  • *
  • 11
The wrappers are supposed to avoid this, by only calling ROM code from arm, never directly from thumb... that's why we have wrappers in the first place! But apparently GCC is assuming that in the 946e-s case, all code will return with BX LR and so optimizing away part of our wrapper.
Can we tell gcc to not do this using some volatile annotations or so?

If the rom code is not thumb safe, why doesn't we change the wrapper
from
Code: [Select]
#define NSTUB(name, addr)\
    .globl _##name ;\
    .weak _##name ;\
    _##name: ;\
        ldr  pc, = ## addr

to

Code: [Select]
#define NSTUB(name, addr)\
    .globl _##name ;\
    .weak _##name ;\
    _##name: ;\
        push {lr} ;\
        blx addr ;\
        pop {lr} ;\
        bx lr

 

Related Topics