suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras) - General Discussion and Assistance - CHDK Forum

suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)

  • 77 Replies
  • 27324 Views
Advertisements
mod edit: I split this from https://chdk.setepontos.com/index.php?board=6.0 as the issue is not related to multicam rigs.



Latest test description

2 poles of 8 cameras using hardware switch.

Test now includes image delete. Function usb_sync_wait currently remains as previously described since the switch is opened as soon as all lcd's go black.

After connecting with multicam.lua the testing sequence is always the same:
 
    start cameraside script
    lock ptp comms
    switch to record
     
repeat:
 
    preshoot
    usb_sync_wait -> at all black lcd, open switch to start exposure
    switch to play
    download image
    delete image
    switch to record

Nine repetitions of this occurred without fail. On the 10th, step 5 (delete image), Cam6 made the usb dis sound with lens extended.

romlog & modules.log attached (the camera date is not current but is consistent with romlog - I must have missed updating that).

The 16 cameras are powered from 2 identical supplies (as previously described). All power supply leads are still 3m in length.

Any comment appreciated.

CLI Trace:
Code: [Select]
___> !mc=require('multicam')
___> !mc:connect({list='C:/CHDKPTP/listfile'})
+ 1:Canon PowerShot SX150 IS b=\\.\libusb0-0001--0x04a9-0x3234 d=bus-0 s=9A95243
8E0A647E9AC0179DDC30582D5
+ 2:Canon PowerShot SX150 IS b=\\.\libusb0-0002--0x04a9-0x3234 d=bus-0 s=689D500
5104943C09E988E4EA8301E22
+ 3:Canon PowerShot SX150 IS b=\\.\libusb0-0003--0x04a9-0x3234 d=bus-0 s=E21B7D7
37E5D4291B5DD048230963CF6
+ 4:Canon PowerShot SX150 IS b=\\.\libusb0-0004--0x04a9-0x3234 d=bus-0 s=449A58C
8297A4CCDAD05327C6BDD6FC2
+ 5:Canon PowerShot SX150 IS b=\\.\libusb0-0005--0x04a9-0x3234 d=bus-0 s=A42727D
40A154E39945D810173405135
+ 6:Canon PowerShot SX150 IS b=\\.\libusb0-0006--0x04a9-0x3234 d=bus-0 s=6C13E0E
B6FF04FEF9D0F50BFAB5B9420
+ 7:Canon PowerShot SX150 IS b=\\.\libusb0-0007--0x04a9-0x3234 d=bus-0 s=F5C3101
2304A49F4B4D84B8736A8714C
+ 8:Canon PowerShot SX150 IS b=\\.\libusb0-0008--0x04a9-0x3234 d=bus-0 s=52358FA
70E8940F9B145A6C7053BB251
+ 9:Canon PowerShot SX150 IS b=\\.\libusb0-0009--0x04a9-0x3234 d=bus-0 s=3AE0C06
EB29A4819B9E48764D58B9E2D
+ 10:Canon PowerShot SX150 IS b=\\.\libusb0-0010--0x04a9-0x3234 d=bus-0 s=D10AC1
5E25614FEBB5AF36C058AE46E1
+ 11:Canon PowerShot SX150 IS b=\\.\libusb0-0011--0x04a9-0x3234 d=bus-0 s=774F38
7C25634767917D45F411CF1C0C
+ 12:Canon PowerShot SX150 IS b=\\.\libusb0-0012--0x04a9-0x3234 d=bus-0 s=FE454C
0413DA4B4F8B00FC8B52DD2853
+ 13:Canon PowerShot SX150 IS b=\\.\libusb0-0013--0x04a9-0x3234 d=bus-0 s=31DD97
F7523B4654B55781E96DE9D38E
+ 14:Canon PowerShot SX150 IS b=\\.\libusb0-0014--0x04a9-0x3234 d=bus-0 s=82AF4B
BBDDE74EC69EB783057889B66E
+ 15:Canon PowerShot SX150 IS b=\\.\libusb0-0015--0x04a9-0x3234 d=bus-0 s=9E5157
ABAF954861A345B6C0775E14BE
+ 16:Canon PowerShot SX150 IS b=\\.\libusb0-0016--0x04a9-0x3234 d=bus-0 s=2C638E
36A31242B8B600ED3341D91884
___> !mc:start()
___> !return mc:cmdwait('lock_ptp_comms')
lock_ptp_comms
___> !return mc:cmdwait('rec')
rec
___> !return mc:cmdwait('preshoot')
preshoot
___> !return mc:cmdwait('usb_sync_wait')
usb_sync_wait
___> !return mc:cmdwait('play')
play
___> !mc:download_images({lastimg=1,fmatch='JPG$'})
___> !mc:delete_images_list(mc:imglist(),{verbose=true})
os.remove("A/DCIM/141___09/IMG_3512.JPG")
os.remove("A/DCIM/140___09/IMG_3335.JPG")
os.remove("A/DCIM/140___09/IMG_2373.JPG")
os.remove("A/DCIM/109___09/IMG_1292.JPG")
os.remove("A/DCIM/139___09/IMG_2394.JPG")
os.remove("A/DCIM/804___05/IMG_1895.JPG")
os.remove("A/DCIM/804___09/IMG_1891.JPG")
os.remove("A/DCIM/804___09/IMG_1744.JPG")
os.remove("A/DCIM/114___09/IMG_0948.JPG")
os.remove("A/DCIM/156___09/IMG_2470.JPG")
os.remove("A/DCIM/114___09/IMG_0497.JPG")
os.remove("A/DCIM/130___01/IMG_1695.JPG")
os.remove("A/DCIM/122___09/IMG_0447.JPG")
os.remove("A/DCIM/101___09/IMG_0007.JPG")
os.remove("A/DCIM/101___09/IMG_0026.JPG")
os.remove("A/DCIM/101___09/IMG_0027.JPG")
os.remove("A/DCIM/110___09/IMG_0865.JPG")

etc etc

___> !return mc:cmdwait('rec')
rec
___> !return mc:cmdwait('preshoot')
preshoot
___> !return mc:cmdwait('usb_sync_wait')
usb_sync_wait
___> !return mc:cmdwait('play')
play
___> !mc:download_images({lastimg=1,fmatch='JPG$'})
___> !mc:delete_images_list(mc:imglist(),{verbose=true})
os.remove("A/DCIM/141___09/IMG_3520.JPG")
os.remove("A/DCIM/140___09/IMG_3343.JPG")
os.remove("A/DCIM/140___09/IMG_2381.JPG")
os.remove("A/DCIM/109___09/IMG_1300.JPG")
os.remove("A/DCIM/139___09/IMG_2402.JPG")
os.remove("A/DCIM/804___05/IMG_1903.JPG")
os.remove("A/DCIM/804___09/IMG_1899.JPG")
os.remove("A/DCIM/804___09/IMG_1752.JPG")
os.remove("A/DCIM/114___09/IMG_0956.JPG")
os.remove("A/DCIM/156___09/IMG_2478.JPG")
os.remove("A/DCIM/114___09/IMG_0505.JPG")
os.remove("A/DCIM/130___01/IMG_1703.JPG")
os.remove("A/DCIM/122___09/IMG_0455.JPG")
os.remove("A/DCIM/101___09/IMG_0015.JPG")
os.remove("A/DCIM/101___09/IMG_0035.JPG")
os.remove("A/DCIM/110___09/IMG_0873.JPG")
___> !return mc:cmdwait('rec')
rec
___> !return mc:cmdwait('preshoot')
preshoot
___> !return mc:cmdwait('usb_sync_wait')
usb_sync_wait
___> !return mc:cmdwait('play')
play
___> !mc:download_images({lastimg=1,fmatch='JPG$'})
___> !mc:delete_images_list(mc:imglist(),{verbose=true})
ERROR: call failed:I/O error

___>
« Last Edit: 14 / October / 2017, 01:06:17 by reyalp »

*

Offline reyalp

  • ******
  • 14128
Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #1 on: 22 / September / 2017, 01:09:07 »
romlog & modules.log attached (the camera date is not current but is consistent with romlog - I must have missed updating that).
Thanks, that romlog definitely points to a bug in CHDK code. I haven't had time to dig into it deeply yet.

For future reference or other interested developers, it appears to be in suba_getmeminfo, which is called from the PTP task to calculate available memory for buffers (which is an ugly thing I've wanted to clean up for a long time, but that's another story).

Code: [Select]
Exception!! Vector 0x10
Occured Time  2015:05:07 11:20:33
Task ID: 39321683
Task name: PTPSessionTA0
Exc Registers:
0x03cc55c0
0x00337268
0x0033726c
0x00031eb0
0x040372e0 R4
0x0002d768
0x00000003
0x0002d7b0
0xffffffff
0x00005b38
0x0019eb48
0x19980218
0x00337270
0x00337210
0x03ca81c1 LR
0x03cb328a PC
0x00000033

Code: [Select]
03cb326c <suba_getmeminfo>:
...
 3cb328a: 6823      ldr r3, [r4, #0]
Don't forget what the H stands for.

Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #2 on: 22 / September / 2017, 04:06:43 »
Ok.

Thanks.

*

Offline philmoz

  • *****
  • 3450
    • Photos
Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #3 on: 22 / September / 2017, 20:00:22 »
romlog & modules.log attached (the camera date is not current but is consistent with romlog - I must have missed updating that).
Thanks, that romlog definitely points to a bug in CHDK code. I haven't had time to dig into it deeply yet.

For future reference or other interested developers, it appears to be in suba_getmeminfo, which is called from the PTP task to calculate available memory for buffers (which is an ugly thing I've wanted to clean up for a long time, but that's another story).

Code: [Select]
Exception!! Vector 0x10
Occured Time  2015:05:07 11:20:33
Task ID: 39321683
Task name: PTPSessionTA0
Exc Registers:
0x03cc55c0
0x00337268
0x0033726c
0x00031eb0
0x040372e0   R4
0x0002d768
0x00000003
0x0002d7b0
0xffffffff
0x00005b38
0x0019eb48
0x19980218
0x00337270
0x00337210
0x03ca81c1   LR
0x03cb328a   PC
0x00000033

Code: [Select]
03cb326c <suba_getmeminfo>:
...
 3cb328a:   6823         ldr   r3, [r4, #0]

R0 = (*suba) would appear to be an EXMEM address for the SX150.
R4 = (*c - iterator through free list) is outside the EXMEM range (based on the build settings).

R4 also looks to be outside the camera RAM address range which might explain the data abort exception.

Some possible causes:
- EXMEM corruption (incorrect EXMEM size)
- buffer overrun corrupting suba memory allocator
- bug in the suba code
- race condition across threads using suba malloc/free messing up the allocator

Phil.
CHDK ports:
  sx30is (1.00c, 1.00h, 1.00l, 1.00n & 1.00p)
  g12 (1.00c, 1.00e, 1.00f & 1.00g)
  sx130is (1.01d & 1.01f)
  ixus310hs (1.00a & 1.01a)
  sx40hs (1.00d, 1.00g & 1.00i)
  g1x (1.00e, 1.00f & 1.00g)
  g5x (1.00c, 1.01a, 1.01b)
  g7x2 (1.01a, 1.01b, 1.10b)

*

Offline reyalp

  • ******
  • 14128
Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #4 on: 23 / September / 2017, 02:26:40 »
R0 = (*suba) would appear to be an EXMEM address for the SX150.
R4 = (*c - iterator through free list) is outside the EXMEM range (based on the build settings).
Thanks.
Quote
R4 also looks to be outside the camera RAM address range which might explain the data abort exception.
Yeah, that was my conclusions.

Quote
- race condition across threads using suba malloc/free messing up the allocator
IIRC srsa_4c raising concerns about this. PTP task and lua being busy seems like something that would stress this. In fact, I wonder if one thread calling malloc/free while another calls suba_getmeminfo might be enough (malloc/free fiddling with the pointers while sub_getmeminfo tries to walk them).

OTOH, the fact that only andrew.stephens.754365 has encountered it might point to a port specific issue. The romlog posted earlier from an autobuild (so main.bin.dump) looked pretty similar: R0 = 0x03CC5330, R4 = 0x04041F68, PC = 0x03CB327A
Don't forget what the H stands for.

Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #5 on: 26 / September / 2017, 07:38:45 »
mc:cmd() will return as soon as the commands are sent
...

If you want to get any status messages generated

...

you can use wait_status_msg with the name of your command,

mc:print_cmd_status(mc:wait_status_msg('my_shoot_command'))

Out of interest and to be sure - if a cameraside command included 2 (or more) sets of wait_timeout_write_status then the first occurrence of mc:print_cmd_status would provide the first function write_status(status,msg) write_usb_msg from the camera and a second mc:print_cmd_status would provide the second etc? (is that up to 15 messages?)

If I knew i've forgotten - can (/how) queued camera messages be deleted?


 
« Last Edit: 26 / September / 2017, 13:59:10 by andrew.stephens.754365 »

*

Offline reyalp

  • ******
  • 14128
Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #6 on: 26 / September / 2017, 17:07:45 »
Out of interest and to be sure - if a cameraside command included 2 (or more) sets of wait_timeout_write_status then the first occurrence of mc:print_cmd_status would provide the first function write_status(status,msg) write_usb_msg from the camera and a second mc:print_cmd_status would provide the second etc? (is that up to 15 messages?)
mc:print_cmd_status just outputs the status values it is passed. mc:cmdwait() returns the values obtained by mc:wait_status_msg, which waits for a single status message from each camera and returns them. so mc:print_cmd_status(mc:cmdwait(...)) prints the first status messages.

you could use mc:wait_status_msg to retrieve additional messages, but note that it will wait if there are none, so you would probably want to use a short timeout.

Before r746, mc:wait_status_msg required that you pass the name of the command for which status is expected, and will  return an error if a message from a different command is seen. In the current code, you can pass nil or false for the command name to retrieve any messages.
Quote
If I knew i've forgotten - can (/how) queued camera messages be deleted?
mc:flushmsgs discards all pending messages. This is done by default when you call mc:cmd or mc:cmdwait. You can suppress it by including flushmsgs=false in the options.
Don't forget what the H stands for.

Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #7 on: 27 / September / 2017, 04:38:37 »
mc:print_cmd_status just outputs the status values it is passed

Yes, I had meant my post, referring to mc:print_cmd_status, as shorthand for mc:print_cmd_status(mc:wait_status_msg('my_shoot_command')) but it is obvious how that wasn't apparent  ::)

I, think I, can see where multiple messages from a single function could be useful (for development or debug purposes at least).

Thanks for all the detail. (Edit: crosslink to Faster Sync Triggering development
https://chdk.setepontos.com/index.php?topic=12219.msg134784#msg134784)



As I mentioned earlier in this thread:
Quote
I am always conscious though that mphx does not seem (in my memory) to have encountered much in the way of ptp based problems in his multicam build thread

In the circumstances, I currently lean toward a port specific issue - could it be safe to assume a long time (or never) for an SX150IS solution to the ptp reported crash? (it is likely to be "never" before I developed the skills to be useful in a solution to that).
« Last Edit: 27 / September / 2017, 14:34:38 by andrew.stephens.754365 »

*

Offline reyalp

  • ******
  • 14128
Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #8 on: 27 / September / 2017, 16:08:36 »
As I mentioned earlier in this thread:
Quote
I am always conscious though that mphx does not seem (in my memory) to have encountered much in the way of ptp based problems in his multicam build thread

In the circumstances, I currently lean toward a port specific issue - could it be safe to assume a long time (or never) for an SX150IS solution to the ptp reported crash? (it is likely to be "never" before I developed the skills to be useful in a solution to that).
I still haven't had time to really look at it. I'll try to get something together this weekend, but it's impossible to guess how hard it will be to fix.

Also, it could be specific to ports with a particular combination of memory settings (exmem, aram etc) rather than being model specific. If it's the kind of generic race condition philmoz suggested, it would still only apply to ports that use at least one of the suba pools.
Don't forget what the H stands for.

*

Offline philmoz

  • *****
  • 3450
    • Photos
Re: suba threading bug (was Re: 3D scanner with multiple IXUS160 cameras)
« Reply #9 on: 27 / September / 2017, 19:16:31 »
Attached is a test build for SX150IS 1.00a.


In this version I've made the following changes:
- enabled EXMEM_TESTING
- added checks in the 'suba' code to look for memory corruption in the free list


Enabling EXMEM_TESTING means that the EXMEM memory won't be used by malloc, although CHDK will still be loaded there. Instead the EXMEM is filled with a guard value and tested regularly to see if it has been altered. If any changes are detected an error is shown on the LCD.


If the 'suba' checks detect any corruption in the free list they will call DebugAssert which will shutdown the camera - a ROMLOG dump will show if the shutdown was in the 'suba' code or not.


If you can run some tests with this version it might help narrow down the problem a bit.


Phil.



(Update: 4/10 old build removed, see later post for new test build.)
« Last Edit: 04 / October / 2017, 04:04:09 by philmoz »
CHDK ports:
  sx30is (1.00c, 1.00h, 1.00l, 1.00n & 1.00p)
  g12 (1.00c, 1.00e, 1.00f & 1.00g)
  sx130is (1.01d & 1.01f)
  ixus310hs (1.00a & 1.01a)
  sx40hs (1.00d, 1.00g & 1.00i)
  g1x (1.00e, 1.00f & 1.00g)
  g5x (1.00c, 1.01a, 1.01b)
  g7x2 (1.01a, 1.01b, 1.10b)

 

Related Topics


SimplePortal © 2008-2014, SimplePortal