suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras) - page 7 - General Discussion and Assistance - CHDK Forum supplierdeeply

suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)

  • 77 Replies
  • 22591 Views
*

Offline reyalp

  • ******
  • 14079
Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #60 on: 13 / October / 2017, 03:52:49 »
Advertisements
- cams with DryOS r52 and higher (but I could be wrong)
Since the exact models are not known (and things can change in future), I think using the sigfinder for detecting the correct argument count would make the most sense. I'm not sure what the sigfinder should output, though.
I ran into this in finsig_thumb2, it finds the two argument form as DebugAssert2, and the 3 argument form as DebugAssert (it should probably reversed for compatibility with most cams). Both are present on the Digic 6 cameras, I haven't checked on older firmware. I think on newer cams the additional argument is numeric.
Don't forget what the H stands for.

*

Offline srsa_4c

  • ******
  • 4451
Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #61 on: 13 / October / 2017, 18:37:31 »
I ran into this in finsig_thumb2, it finds the two argument form as DebugAssert2, and the 3 argument form as DebugAssert (it should probably reversed for compatibility with most cams). Both are present on the Digic 6 cameras, I haven't checked on older firmware.
Checked a D4+ cam (ixus150) - it also has the DebugAssert wrapper with 2 arguments.
Quote
I think on newer cams the additional argument is numeric.
To me it appears that the default assert handler ignores the first argument.

Not sure what to do, as the ixus30/40 does not appear to have such a wrapper. I also don't know for sure which models have the 3 argument version (would need to write that detection).
Since the exact models are not known (and things can change in future)
Could there be a way (e.g from the community) to know better now?
In this case, user feedback is not required - it's possible to make that list using firmware dumps and some code.

*

Offline srsa_4c

  • ******
  • 4451
Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #62 on: 20 / October / 2017, 18:26:50 »
it's possible to make that list using firmware dumps and some code.
The list of supported models using 3-argument DebugAssert is:
ixus145_elph135
ixus150_elph140
ixus160_elph160
ixus30_sd200
ixus40_sd300
sx170is
sx400is
sx410is
sx510hs
sx520hs
sx530hs
+ all DIGIC 6 cams
(did not actually check this)
Partial patch is attached - finsig_vxworks, finsig_dryos, wrappers.c
Sigfinder outputs a variable (DEF_CONST), the CHDK DebugAssert wrapper decides the correct order of arguments at runtime. This could probably be done better, so suggestions are welcome. I did this to avoid manual changes to port-specific files.

*

Offline philmoz

  • *****
  • 3450
    • Photos
Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #63 on: 20 / October / 2017, 18:47:01 »
it's possible to make that list using firmware dumps and some code.
The list of supported models using 3-argument DebugAssert is:
ixus145_elph135
ixus150_elph140
ixus160_elph160
ixus30_sd200
ixus40_sd300
sx170is
sx400is
sx410is
sx510hs
sx520hs
sx530hs
+ all DIGIC 6 cams
(did not actually check this)
Partial patch is attached - finsig_vxworks, finsig_dryos, wrappers.c
Sigfinder outputs a variable (DEF_CONST), the CHDK DebugAssert wrapper decides the correct order of arguments at runtime. This could probably be done better, so suggestions are welcome. I did this to avoid manual changes to port-specific files.


Given the small number why not use a compile setting, with a warning written to stubs_entry.s?
The default value would be to use the 3 arg option for DIGIC 6, otherwise use the 2 arg version.
The default value would only need to be overridden for the 11 cameras listed (in platform_camera.h)

Edit: I've combined srsa_4c's patch with mine and modified it to use a build setting for the 2/3 argument version check. New patch attached. I've almost done testing, so unless there are questions/concerns I'll try and commit this in a few days.

Edit2: Update patch, found an edge case in suba_free that was not handled correctly.

Edit3: Patch updated again. Fixed a few more bugs, added some comments and tried to make code more readable.

Edit4: Last patch before I add this to SVN. Changed the free list logic to make alloc/free simpler. The getmeminfo values are mostly maintained dynamically instead of being calculated in each call to getmeminfo.

Phil.
« Last Edit: 21 / October / 2017, 21:56:05 by philmoz »
CHDK ports:
  sx30is (1.00c, 1.00h, 1.00l, 1.00n & 1.00p)
  g12 (1.00c, 1.00e, 1.00f & 1.00g)
  sx130is (1.01d & 1.01f)
  ixus310hs (1.00a & 1.01a)
  sx40hs (1.00d, 1.00g & 1.00i)
  g1x (1.00e, 1.00f & 1.00g)
  g5x (1.00c, 1.01a, 1.01b)
  g7x2 (1.01a, 1.01b, 1.10b)


Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #64 on: 20 / October / 2017, 23:43:54 »
it's possible to make that list using firmware dumps and some code.
The list of supported models using 3-argument DebugAssert is:
ixus145_elph135....

ixus160_elph160.....

New patch attached. I've almost done testing, so unless there are questions/concerns I'll try and commit this in a few days.

Edit2: Update patch, found an edge case in suba_free that was not handled correctly.

Phil.

I have 65 Ixus_145 and one Ixus_160 cameras and will be able to be able to test any new test build for the patch.

H-H

« Last Edit: 21 / October / 2017, 00:03:20 by Hardware_Hacker »

*

Offline reyalp

  • ******
  • 14079
Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #65 on: 22 / October / 2017, 01:13:59 »
Edit4: Last patch before I add this to SVN. Changed the free list logic to make alloc/free simpler. The getmeminfo values are mostly maintained dynamically instead of being calculated in each call to getmeminfo.
Thanks for doing this, and apologies for not being more involved, just now coming out of an RL crunch.

I ran the test msgtest test case on sx160 with patch 8, and it didn't crash. It does seem slower (~240 sec for 1k iterations), but  I think this is expected because it's a pathological case.

I tried to do some general stress testing by allocating enough memory in lua to run through exmem, and got some odd results in get_meminfo

Test case in chdkptp is
Code: [Select]
=t={} i=1 repeat t[i]=string.rep(tostring(math.random(0xffff)),math.random(1,3)) i=i+1 until get_meminfo('exmem').free_block_max_size < 100 or i > 20000 return i,get_meminfo('exmem')
This generates a bunch of random strings until free_block_max_size gets below 100 (with a 20000 iteration limit just for sanity)

Doing this, it returns after a ~600 iterations with exmem meminfo like

Code: [Select]
allocated_size=150080,
 free_size=236000,
 free_block_count=6,
 free_block_max_size=24,
Which doesn't make much sense to me, since ~230K in 6 free block should leave at least one bigger than 24.

sx160 has ARAM, exmem and a fair amount of Canon heap. In the normal CHDK code, the test should burn through ARAM and then exmem before putting much in the canon heap.

Changing the condition to free_size < 1000 runs longer, and appears to allocate more
Code: [Select]
con 33> =t={} i=1 repeat t[i]=string.rep(tostring(math.random(0xffff)),math.random(1,3)) i=i+1 until get_meminfo('exmem').free_size < 1000 or i > 20000 return i,get_meminfo('exmem')
34:return:4001
34:return:table:
...
free_size=924
allocated_size=357988,
free_block_count=44,
allocated_count=3485,
free_block_max_size=208


Doing a standalone get_meminfo('exmem') gives more expected results
Code: [Select]
free_size=270768
allocated_size=116360
free_block_count=1,
allocated_count=1,
free_block_max_size=270768,
total_size=387216

Incorrect free_block_max_size is potentially bad, because it's used as a proxy for free memory in some cases (core_get_free_memory)
« Last Edit: 22 / October / 2017, 01:16:13 by reyalp »
Don't forget what the H stands for.

*

Offline philmoz

  • *****
  • 3450
    • Photos
Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #66 on: 22 / October / 2017, 02:22:49 »
I tried to do some general stress testing by allocating enough memory in lua to run through exmem, and got some odd results in get_meminfo

Test case in chdkptp is
Code: [Select]
=t={} i=1 repeat t[i]=string.rep(tostring(math.random(0xffff)),math.random(1,3)) i=i+1 until get_meminfo('exmem').free_block_max_size < 100 or i > 20000 return i,get_meminfo('exmem')
This generates a bunch of random strings until free_block_max_size gets below 100 (with a 20000 iteration limit just for sanity)

Doing this, it returns after a ~600 iterations with exmem meminfo like

Code: [Select]
allocated_size=150080,
 free_size=236000,
 free_block_count=6,
 free_block_max_size=24,
Which doesn't make much sense to me, since ~230K in 6 free block should leave at least one bigger than 24.


Could this be caused by the PTP code that allocates 1/2 the largest free block size and then releases it?
The get_meminfo values you see might be retrieved while this large block has been taken by PTP.

Attached is the program I used to test the code (on the host not CHDK).
Place in the core directory and compile with "gcc -DSUBA_DEBUG=1 suba_tst.c suba.c"
This tests various edge cases then loops 1,000,000 times doing random allocations and frees.
It checks the maintained stats against the calculated ones each time - it has not given me any errors (but that's not to say I don't have a logic error somewhere).

Phil.
« Last Edit: 22 / October / 2017, 02:29:28 by philmoz »
CHDK ports:
  sx30is (1.00c, 1.00h, 1.00l, 1.00n & 1.00p)
  g12 (1.00c, 1.00e, 1.00f & 1.00g)
  sx130is (1.01d & 1.01f)
  ixus310hs (1.00a & 1.01a)
  sx40hs (1.00d, 1.00g & 1.00i)
  g1x (1.00e, 1.00f & 1.00g)
  g5x (1.00c, 1.01a, 1.01b)
  g7x2 (1.01a, 1.01b, 1.10b)

*

Offline reyalp

  • ******
  • 14079
Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #67 on: 22 / October / 2017, 02:50:34 »
A few other general thoughts:

One thing about using Canon asserts from CHDK code is it leaves evidence of CHDK in ROM, which conceivably be noticed by Canon service. OTOH, regular crashes where we trigger an exception or assert indirectly show up there anyway. IIRC some parts of the wiki said "canon will never know without the SD card" but this is not strictly true.

I'm not particularly worried about this, but I'd be interested if others have opinions. If there were concerns, we could control what our assert wrapper does at runtime.


Incorrect CAM_3ARG_DebugAssert definitions would very likely go unnoticed. Probably not a big deal.

Should this be backported to 1.4? It's a significant crash bug, which argues for yes, but the suba re-write makes me a bit nervous, and there will be some conflicts involving the ports that aren't in 1.4.

I'd probably up the semaphore timeout once we're satisfied it's working. If it times out it means something went very wrong, so hanging for ~1 sec should be no issue. I worry that there might be rare cases where shooting could keep something being scheduled for 200ms.

Not sure if it was intended to be, but SUBA_MAGIC doesn't seem to be checked. A similar check value at the end of the pool might also be good.

suba_init enforces size > 2048. Conceivably, I think CHDK in ARAM could leave less than that (but probably shouldn't, since minor changes might push it over the edge) We could skip the ARAM heap in that case.
Don't forget what the H stands for.


*

Offline reyalp

  • ******
  • 14079
Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #68 on: 22 / October / 2017, 03:01:02 »
Could this be caused by the PTP code that allocates 1/2 the largest free block size and then releases it?
I may be missing something, but I don't think that can account for it. The buffer is only allocated for commands that need it. While the script is running, only PTP_CHDK_ScriptStatus should be called, and that doesn't use a buffer. (in the past I think it did always allocate)

core_get_free_memory also uses the "combined" meminfo, so should see the largest free block in the Canon heap once exmem is used.

edit:
In any case

free_size=236000
free_block_max_size=24
free_block_count=6

is clearly inconsistent. You can't divide 236000 by 6 and end up with a max size of 24. I wondered if overhead was being counted incorrectly somewhere, but allocated_count was only 127
« Last Edit: 22 / October / 2017, 03:19:32 by reyalp »
Don't forget what the H stands for.

*

Offline philmoz

  • *****
  • 3450
    • Photos
Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #69 on: 22 / October / 2017, 03:06:05 »
A few other general thoughts:

One thing about using Canon asserts from CHDK code is it leaves evidence of CHDK in ROM, which conceivably be noticed by Canon service. OTOH, regular crashes where we trigger an exception or assert indirectly show up there anyway. IIRC some parts of the wiki said "canon will never know without the SD card" but this is not strictly true.

I'm not particularly worried about this, but I'd be interested if others have opinions. If there were concerns, we could control what our assert wrapper does at runtime.

Doesn't really worry me; but it might be better to have the checks/asserts disabled in the auto build.

Quote
Incorrect CAM_3ARG_DebugAssert definitions would very likely go unnoticed. Probably not a big deal.

Should this be backported to 1.4? It's a significant crash bug, which argues for yes, but the suba re-write makes me a bit nervous, and there will be some conflicts involving the ports that aren't in 1.4.


Alternatively retire 1.4, make 1.5 the release build and start 1.6 for waterwingz' new UI.


Quote
I'd probably up the semaphore timeout once we're satisfied it's working. If it times out it means something went very wrong, so hanging for ~1 sec should be no issue. I worry that there might be rare cases where shooting could keep something being scheduled for 200ms.


Good idea.


Quote
Not sure if it was intended to be, but SUBA_MAGIC doesn't seem to be checked. A similar check value at the end of the pool might also be good.


This was used in the original if the allocator was re-initialised. We never do this so the code was removed.
But checking it is a good idea - I'll add this to suba_getmeminfo. I have a spare spot at the end of the memory block for a check value as well, so I'll do that to.


Quote
suba_init enforces size > 2048. Conceivably, I think CHDK in ARAM could leave less than that (but probably shouldn't, since minor changes might push it over the edge) We could skip the ARAM heap in that case.


Yes, I added that to avoid creating small allocators if CHDK almost filled ARAM.


Phil.
CHDK ports:
  sx30is (1.00c, 1.00h, 1.00l, 1.00n & 1.00p)
  g12 (1.00c, 1.00e, 1.00f & 1.00g)
  sx130is (1.01d & 1.01f)
  ixus310hs (1.00a & 1.01a)
  sx40hs (1.00d, 1.00g & 1.00i)
  g1x (1.00e, 1.00f & 1.00g)
  g5x (1.00c, 1.01a, 1.01b)
  g7x2 (1.01a, 1.01b, 1.10b)

 

Related Topics