Writing DNG in parallel with reversing - page 4 - General Discussion and Assistance - CHDK Forum supplierdeeply

Writing DNG in parallel with reversing

  • 83 Replies
  • 18115 Views
*

Offline reyalp

  • ******
  • 13393
Re: Writing DNG in parallel with reversing
« Reply #30 on: 26 / May / 2013, 15:11:15 »
Advertisements
In fact, which one is faster with concurrent task, writing chunky or bulky buffer ? You are suggesting bulky but I am intrigated by the reason.
I have tested this quite a bit. All else being equal, larger chunks appear to be better. The only reason not to write the whole thing in one go is to allow reverser to get started while some writing is happening, and to allow most of the de-reverse to happen in parallel with writing.

The SD hardware uses DMA, so the the CPU barely needs to be involved at all, it just gets an interrupt when the write is done. Apparently, the bus traffic generated doing this isn't enough slow reversing down much at all.

Quote
I like DNG init task since a 50% of DNG tags don't change on a per picture basis :)
Did you measure what improvement this makes? To me, optimizing without measurement is a waste of time. Things frequently don't behave as we expect, my original parallel task version was slower than the serial version. That's why my code records a whole bunch of timing statistics, and why I wrote a script to get statistically meaning samples.
Don't forget what the H stands for.

*

Offline reyalp

  • ******
  • 13393
Re: Writing DNG in parallel with reversing
« Reply #31 on: 26 / May / 2013, 16:32:54 »
I have tested on s2is. No problems as far. DNG saving ~1s faster than from CHDK trunk 1.2.
Thanks.

Since s2is is a vxworks camera, I would expect it to have the hang problem, where the camera sometimes locks up in DNG saving. Have you run the test script from post: http://chdk.setepontos.com/index.php?topic=9970.msg100756#msg100756 ?

I'm still quite confused by this problem. Brain dump follows... this more to organize my thoughts than anything else, not really expecting others to wade through it.

I've confirmed that it happens in a few different ways. Usually it's spytask that locks up, somewhere in free_dng_header. It can happen in the free() or ufree(), but sometimes kbd_task task is locked. It's unclear what has actually happened when kbd_task is locked, since I am unable to do any sort of debugging.

I added code to kbd_task to make the power LED constantly flicker. This tells me at a glance whether the main loop is still running or not.

Since PTP (sometimes) continues to work when spytask hangs, I wrote some code to dump all the TCBs to a file. I run this using PTP call function, and then download with PTP. Interestingly, sometimes PTP download fails with ptp error 0x2002, apparently due to open failing on any file I try to open. Other (non-lua) PTP functions like rmem and callfunction continue to work. In these cases, after rebooting the file is there and readable.

The TCB includes the task context (registers from the last time it was suspended) and the running state. The PC is always in the vxworks task code, but the LR varies. When a task is "hung" in this situation, it always appears to be in the "ready" state. I have never seen other tasks in that state. As far as I understand, this should mean that it will get scheduled the next time OS scheduling rules permit, it it never appears to change after the hang. Most other tasks are in "suspend" which makes sense because they are typically waiting on some IPC object. kbd_task, where the dump code actually runs is always 0 status (running?). tClockSave also seems to be status 0, but this appears to be the case when the camera operating normally. The same is sometimes true of spytask.

Interestingly, calling taskResume (as identified by the vxworks flirt sigs) on the "hung" task will sometimes allow the shot to complete and the script to continue, although a crash or hard hang usually follows fairly soon after. There's another function identified as "ResumeTask" which I have not yet tried.

I tried moving the free_dng_header into the DNG writer task, directly after the actual header write. This causes the writer task to hang instead of spytask. Otherwise the symptoms seems similar

In several of the spytask hangs, I noticed the LR of the task context was in this code (a540, 100b):
Code: [Select]
ROM:FFEE08FC 00 00 A0 03                 MOVEQ   R0, #0
ROM:FFEE0900 E3 34 00 0B                 BLEQ    f_taskSuspend
ROM:FFEE0904             loc_FFEE0904                            ; CODE XREF: memPartFree+F0j
ROM:FFEE0904 F0 06 9F E5                 LDR     R0, =0x110003
Note this is in memPartFree free is the underlying code for free() and ufree().

Another oddity in the tcb dump is the task options, which show as 0x104 for all tasks except tExcTask and tLogTask. According to the vxworks docs I can find 0x100 is VX_NO_STACK_FILL, but 4 is not documented. This is based on the reversed TCB in includes/vxwork.h, which might have errors.

Some other things I have tried
- make the main task function in ARM code. Theory: ending the task in thumb makes the vxworks kernal do something wrong. But I may have messed this up, since the task was created from thumb, so might still actually be using the automatically generated interworking...
- Move the file open/close into the writer task, just passing the filename to write_dng. Theory: sharing file descriptors between tasks is bad. Vxworks docs say it's OK, but maybe the FsIoNotify stuff could be confused. I think this is a good idea anyway, but it doesn't help.
- Use cached or uncached DNG. Theory: cache cleaning code might be doing something weird.
- Use different SD cards ;)

One thing I'm tempted to test is to make the DNG writer get re-used instead of exiting. This is how most of the Canon tasks operate. However, unless we want it constantly sitting in sleep loop, we would need get some task synchronization object functions (semaphore, message queue, event flag) in the sig fingers. Using these could gain a bit of efficiency in the process too, since we wouldn't have to do 10ms sleeps when the various tasks are waiting for each other.

I discovered that on msleep(0) does *not* yield, contrary to my comment in the current svn code, it's effectively non-op on both vxworks and dryos. This is slightly odd, because inserting did seem to allow dng_writer to get started a bit earlier on dryos.

attached for future reference are my tcb dumping code and a chdkptp script to parse the results. Note there's some other cruft in the dumping code, I'm sure anyone who needs it can sort it out.

The way to call this over ptp is to find req_task_dump in your main.bin.dump file, and then call it like
Code: [Select]
!con:call_function(0x94c2d)
Note you need to add 1 to the address in the main.bin.dump listing since it's thumb.

The decoding code can be used like this:
Code: [Select]
!m=require'extras/vxtcbdump'
!d=m.load('tasks.dmp')
!d:list_tasks()
!d.byname.tSpyTask:print()

edit:
I should add that I've done long runs (>50 shots) using only the original single task code without hitting the hang.
« Last Edit: 26 / May / 2013, 17:23:06 by reyalp »
Don't forget what the H stands for.

Re: Writing DNG in parallel with reversing
« Reply #32 on: 26 / May / 2013, 17:31:28 »
Forgive me that I don't follow your whole post - I'm a little bit drunk now...:)

However I have tested the script you have linked and the s2is has perfectly finished the task - script did all the 20 photos with RAW/DNG enabled. The cam didn't hang. No other problems. It just finished and was responsive for keyboard. No crash symptoms.

I attach the csv file produced by the script.

I hope this is helpfull for you.

Best regards for the work!
if (2*b || !2*b) {
    cout<<question
}

Compile error: poor Yorick

*

Offline reyalp

  • ******
  • 13393
Re: Writing DNG in parallel with reversing
« Reply #33 on: 26 / May / 2013, 17:59:09 »
Forgive me that I don't follow your whole post - I'm a little bit drunk now...:)
Don't worry, only the first part was aimed at you.

If you wouldn't mind running it a bit longer I'd be interested to know if it hangs eventually, 10 shots using the new code isn't much. I'd suggest the following settings, which will run 54 shots with the new code, which  should either be enough to trigger the hang, provide some performance data with different size if it doesn't.

shots per test 6
rev chunk kb 256
rev chunk steps 3
end chunk kb 256
end chunk steps 3

If this isn't convenient, don't bother, it's probably not going to make a difference in solving the issue.


edit:
Looking the csv you posted, DNG with the old code is around 3.3 seconds. With the new code, saving time is about 2.8.  The actual time to just write the raw data is around 2.7, so this is pretty good.
« Last Edit: 26 / May / 2013, 18:04:09 by reyalp »
Don't forget what the H stands for.


*

Offline reyalp

  • ******
  • 13393
Re: Writing DNG in parallel with reversing
« Reply #34 on: 26 / May / 2013, 22:36:25 »
May have fixed the vxworks issue. In revision 2800, I increased the stack size of the writer task to 2k. I've done a couple of long runs without any hangs.
Don't forget what the H stands for.

*

Offline reyalp

  • ******
  • 13393
Re: Writing DNG in parallel with reversing
« Reply #35 on: 26 / May / 2013, 23:42:08 »
In changeset 2801, I added yet more timing information. Attached update to the lua script records it.
Don't forget what the H stands for.

*

Offline dvip

  • ****
  • 451
Re: Writing DNG in parallel with reversing
« Reply #36 on: 26 / May / 2013, 23:55:11 »
I just tested branches_dng-async-write-2800.zip and it works fine on my A590is/101b.
It writes the DNG in about 3 secs which is great.  :)
I didn't run the test lua script, but the DNG writing sure feels faster than before.

*

Offline dvip

  • ****
  • 451
Re: Writing DNG in parallel with reversing
« Reply #37 on: 27 / May / 2013, 00:22:01 »
branches_dng-async-write-2801.zip runs fine too  on my A590is/101b.
Attached is the dngtest.csv file.


*

Offline reyalp

  • ******
  • 13393
Re: Writing DNG in parallel with reversing
« Reply #38 on: 27 / May / 2013, 04:39:14 »
Thumbnail creation time is actually fairly significant, ~110ms for current trunk code on 540. Faster cameras suffer less, since it depends on thumb size rather than sensor size. D10 is ~50ms.

Getting it from the live view would be nice, but it appears difficult to come up with an implementation that will be reliable and not require platform specific changes. Creating before the shot started could also add shutter latency, which is IMO worse than extra saving time.

In changeset 2804, I got rid of the multiplies and divides in the main loop, which cut the time roughly in half. Our CPU has no DIV instruction, so divides are particularly bad.

The time to create the rest of the header is negligible. The first time the gamma map is created does seem to take significant time (~50ms on a540), so I'll convert that to a pre-calculated table.
Don't forget what the H stands for.

*

Offline philmoz

  • *****
  • 3332
    • Photos
Re: Writing DNG in parallel with reversing
« Reply #39 on: 27 / May / 2013, 05:47:25 »
The first time the gamma map is created does seem to take significant time (~50ms on a540), so I'll convert that to a pre-calculated table.

Current table values below:
Code: [Select]
static unsigned char gamma[256] =
{
0,15,22,27,31,35,39,42,45,47,50,52,75,77,79,82,
84,86,88,90,92,93,95,97,99,100,102,103,105,106,108,109,
111,112,113,115,116,117,119,120,121,122,123,125,126,127,128,129,
130,131,132,133,134,136,137,138,139,140,141,141,142,143,144,145,
180,181,181,182,183,183,184,185,185,186,187,187,188,189,189,190,
190,191,192,192,193,193,194,194,195,195,196,197,197,198,198,199,
199,200,200,201,201,202,202,203,203,204,204,205,205,206,206,207,
207,208,208,208,209,209,210,210,211,211,212,212,212,213,213,214,
214,215,215,215,216,216,217,217,217,218,218,219,219,219,220,220,
221,221,221,222,222,222,223,223,224,224,224,225,225,225,226,226,
226,227,227,228,228,228,229,229,229,230,230,230,231,231,231,232,
232,232,233,233,233,234,234,234,235,235,235,235,236,236,236,237,
237,237,238,238,238,239,239,239,239,240,240,240,241,241,241,242,
242,242,242,243,243,243,244,244,244,244,245,245,245,246,246,246,
246,247,247,247,247,248,248,248,249,249,249,249,250,250,250,250,
251,251,251,251,252,252,252,252,253,253,253,253,254,254,254,255
};

Probably also worth testing if the rounding of the dng header size up to a 512 byte boundary makes any appreciable difference.

Phil.
CHDK ports:
  sx30is (1.00c, 1.00h, 1.00l, 1.00n & 1.00p)
  g12 (1.00c, 1.00e, 1.00f & 1.00g)
  sx130is (1.01d & 1.01f)
  ixus310hs (1.00a & 1.01a)
  sx40hs (1.00d, 1.00g & 1.00i)
  g1x (1.00e, 1.00f & 1.00g)
  g5x (1.00c, 1.01a, 1.01b)
  g7x2 (1.01a, 1.01b, 1.10b)

 

Related Topics