Another host -> camera transfer bug:
I discovered on SX730 HS (dryos r59p4), PTP/IP uploads larger than PTP buffer size (meaning multiple calls to data->recv_data) hang for a minute and close the connection.
This only affects PTP/IP (wifi) and does not affect my earlier wifi enabled cams: elph130 (dryos r52), g7x (r55p6) and sx710 (r57).
The cause appears to be a bug in the Canon firmware. Given that we've seen several other bugs related to large or specifically sized uploads (see
here and
here for example), it seems likely Canon's own software doesn't use large uploads in a single transaction and their testing doesn't exercise it.
The SX730 PTP/IP code appears to have been refactored significantly compared to the earlier cams. In particular, it adds tasks called PtpipController and PtpipPacketDetector which are directly involved in the affected code. I haven't checked in detail yet, but other cameras with the tasks may well be affected. These include G7X2 (r58p10) G9X2 (r59p3), ixus190_elph200 (r59p3) M100 (r59p4) M5 (r59p3) M6 (r59p3) SX430IS (r59p3) SX620HS (r58p7)
(edit: Actually on the r58 cams, the task is "PtpipContoroller" [sic], not "PtpipController"
but it appears functionally the equivalent)
The digic 7 cameras may be different since some network stuff is handled on a different CPU core ("Lime") but on g7x2 at least, the structure of PtpipController and PtpipPacketDetector seems very similar to sx730.
SX740 (Digic 8, r60) and xf605 firmware do not have these task names, suggesting the new unified EOS/PowerShot/DV codebase is unlikely to be affected.
The bug itself appears to be due to the call to FUN_fc162848 (SX730 firmware 100d) in FUN_fc21fe8e. FUN_fc162848 resets a structure used by PtpipPacketDetector, including
removing the command socket from the fd_set used by the select() call in PtpipPacketDetector. Since the socket is removed, the next call to select blocks indefinitely waiting for data, and the eventflag wait at fc2212ac in PTP/IP data->recv_data times out after 60s, triggering an error which closes the connection. As far as I can tell this is not a deliberate error case (which typically either assert or immediately trigger an error state that closes the connection), and the Canon code is simply wrong. This is supported by the fact a standard SendObject call without CHDK running fails in a similar way (though it hangs until the battery is removed rather than disconnecting).
The error case is only hit when a call to data->recv_data reads the specified data size, and there is still more data in the PTP transaction. For CHDK commands, the trigger size will usually be around half of free memory, which ends up around 250K for SX730 (a wifi connection reduces free mem significantly). For native Canon commands (or CHDK ports configured with CAM_PTP_USE_NATIVE_BUFFER) it should be around 256K.
There is clearly logic in the code that is supposed to handle the multiple recv_data call case. In fact, bypassing the problematic call to FUN_fc162848 and twiddling a couple other things appears to completely fix the problem, both for my chdkptp tests and SendObject. So I've done that for SX730 100d (and 100e since it uses the same build) in trunk r6279.
I haven't yet implemented the fix for the other SX730 firmware, since it involves a bunch of firmware specific addresses and I suspect chdkptp PTP/IP usage is very rare, but I can probably add if anyone needs it, or implement the same for other affected models.
You can test if a CHDK supported camera is affected using chdkptp (from recent svn, built with PTPIP support, not the binary builds) by initiating a PTP/IP connection on the camera side and then using
chdkptp -e'exec require"camtests".tests:do_test{connect_dev="-h=<camera IP address>",skip_names={"shoot"}}'
Note this will take several minutes. If it fails, there will be error messages and the final output will include a failure count.
I've attached my initial fix with my final debugging / logging code for future reference, since it includes some potentially useful hooking that isn't need in the final fix. This includes hooking socket recv() as well as the various callbacks used by PtpipController