Writing DNG in parallel with reversing - General Discussion and Assistance - CHDK Forum

Writing DNG in parallel with reversing

  • 83 Replies
  • 14128 Views
*

Offline reyalp

  • ******
  • 12737
Writing DNG in parallel with reversing
« on: 18 / May / 2013, 04:54:01 »
Advertisements
Something I've wanted to try for a long time is doing the DNG writes a different task from the reverse bytes, on the assumption that once the DMA for the write is fired off, the OS will yield the task doing the write and we can let the reverse continue in parallel.

Finally got around to trying it. The attached patch reduces DNG saving on my D10 from ~3.7 sec to ~3.2 sec. CHDK raw is ~ 3.0 sec, so this gets the overhead of DNG pretty close to negligible.

I've tried a whole bunch of variants to get to this point (differing chunk sizes, sleep locations etc), but there's probably more room for improvement. Using proper IPC functions rather than globals and sleeps could potentially improve things.

It runs and seems to give a modest improvement on a540, but I haven't tried to optimize it there yet.
« Last Edit: 18 / May / 2013, 23:02:44 by reyalp »
Don't forget what the H stands for.

*

Offline philmoz

  • *****
  • 3179
    • Photos
Re: Writting DNG in parallel with reversing
« Reply #1 on: 18 / May / 2013, 05:24:12 »
Something I've wanted to try for a long time is doing the DNG writes a different task from the reverse bytes, on the assumption that once the DMA for the write is fired off, the OS will yield the task doing the write and we can let the reverse continue in parallel.

Finally got around to trying it. The attached patch reduces DNG saving on my D10 from ~3.7 sec to ~3.2 sec. CHDK raw is ~ 3.0 sec, so this gets the overhead of DNG pretty close to negligible.

I've tried a whole bunch of variants to get to this point (differing chunk sizes, sleep locations etc), but there's probably more room for improvement. Using proper IPC functions rather than globals and sleeps could potentially improve things.

It runs and seems to give a modest improvement on a540, but I haven't tried to optimize it there yet.

Nice idea.

On the G1X &G12 generates corrupted DNG files - files size is only ~2MB.

It seems that because there are two raw buffers, the loop to re-reverse the buffer is skipped.
It returns to the caller where the file handle is closed before the writing is finished.

Edit: Ok, I got it to work on the G1X by waiting for the write to finish (instead of re-reversing the buffer).
The 'dng_end_ptr' value is off-by-one so the DNG file is 1 byte too small.
It should be:
    dng_end_ptr = dng_write_ptr + camera_sensor.raw_size;

G1X performance:
DNG no patch   = 1.7secs
DNG + patch    = 1.5secs
RAW save time = 1.2secs

G12 performance:
DNG no patch   = 1.9secs
DNG + patch    = 1.8secs
RAW save time = 1.6secs

Phil.
« Last Edit: 18 / May / 2013, 06:41:39 by philmoz »
CHDK ports:
  sx30is (1.00c, 1.00h, 1.00l, 1.00n & 1.00p)
  g12 (1.00c, 1.00e, 1.00f & 1.00g)
  sx130is (1.01d & 1.01f)
  ixus310hs (1.00a & 1.01a)
  sx40hs (1.00d, 1.00g & 1.00i)
  g1x (1.00e, 1.00f & 1.00g)
  g5x (1.00c, 1.01a, 1.01b)

*

Offline dvip

  • ****
  • 451
Re: Writting DNG in parallel with reversing
« Reply #2 on: 18 / May / 2013, 13:24:25 »
>The attached patch reduces DNG saving on my D10 from ~3.7
>sec to ~3.2 sec. CHDK raw is ~ 3.0 sec, so this gets the
>overhead of DNG pretty close to negligible.

Cool! Some DNG saving improvement .
I would like to try this on my A590is.



*

Offline reyalp

  • ******
  • 12737
Re: Writting DNG in parallel with reversing
« Reply #3 on: 18 / May / 2013, 14:46:59 »
It seems that because there are two raw buffers, the loop to re-reverse the buffer is skipped.
It returns to the caller where the file handle is closed before the writing is finished.
Oops, you are right I moved the wait into the re-reverse loop and didn't think about that. As I'm sure you can tell, it's very proof of concept at the moment.

Some other thoughts:
We could actually start the reversing before writing the thumbnail and exif. Maybe reversing should be the additional task, and writing should be in the main thread. On the other hand, a generic background write task might be more flexible, but in the case of DNG we need to know how much is done for re-reversing.

There are other places we might take advantage of this. If we knew when in the canon firmware the raw buffer becomes invalid, we could potentially improve the performance of regular raw as well.

Also a note on testing:
I have found that the first shot after booting is usually substantially slower than the average, on the order of half a second or more for the d10.

Don't forget what the H stands for.


*

Offline reyalp

  • ******
  • 12737
Re: Writting DNG in parallel with reversing
« Reply #4 on: 18 / May / 2013, 18:23:32 »
Updated patch
- fixes the off by one and multiple raw buffer issues.
- sleep every reverse chunk, seems to give more stable times.
- bumped priority of writer task. The idea is that the OS will be more likely to run it again once a write is done. I tried going even lower (16 to be higher priority than spytask) but it didn't seem to make any noticeable difference.

a540 seems to do DNG in ~1.3 sec, where the trunk would take about 1.6 sec. CHDK raw is about 1 sec.

I had some unexplained hangs on a540 while working on this. I haven't seem them in the current patch, but there may be some issue lurking.
Don't forget what the H stands for.

*

Offline philmoz

  • *****
  • 3179
    • Photos
Re: Writting DNG in parallel with reversing
« Reply #5 on: 18 / May / 2013, 19:25:21 »
Updated patch
- fixes the off by one and multiple raw buffer issues.
- sleep every reverse chunk, seems to give more stable times.
- bumped priority of writer task. The idea is that the OS will be more likely to run it again once a write is done. I tried going even lower (16 to be higher priority than spytask) but it didn't seem to make any noticeable difference.

a540 seems to do DNG in ~1.3 sec, where the trunk would take about 1.6 sec. CHDK raw is about 1 sec.

I had some unexplained hangs on a540 while working on this. I haven't seem them in the current patch, but there may be some issue lurking.

Tested on the SX40:
DNG no patch = 1.1secs
DNG + patch = 1.0sec
RAW saving = 0.9secs

Some additional comments & questions:
- I moved the header and thumbnail writes into dng_writer, and created the dng_writer task before the reverse loop. Doesn't improve performance; but makes the code cleaner (removes the need for 'task_created').
- I incremented rsleep while waiting for the write to finish - most of the time is spent here.
- Removing the limit on 'size' in dng_writer (to only write a max of DNG_CHUNK_SIZE bytes per loop) reduced the DNG time from 1.5 to 1.4 secs on the G1X.
- Using cached or uncached memory makes no difference to the save times (G1X). Is the call to 'dcache_clean_all' necessary? I commented it out and the DNG files still get saved correctly.

All my cameras also take longer to save the first RAW/DNG file after power up; but this is the case with the existing trunk / release-1.1 code as well.

Phil.
CHDK ports:
  sx30is (1.00c, 1.00h, 1.00l, 1.00n & 1.00p)
  g12 (1.00c, 1.00e, 1.00f & 1.00g)
  sx130is (1.01d & 1.01f)
  ixus310hs (1.00a & 1.01a)
  sx40hs (1.00d, 1.00g & 1.00i)
  g1x (1.00e, 1.00f & 1.00g)
  g5x (1.00c, 1.01a, 1.01b)

*

Offline reyalp

  • ******
  • 12737
Re: Writting DNG in parallel with reversing
« Reply #6 on: 18 / May / 2013, 19:42:27 »
- I moved the header and thumbnail writes into dng_writer, and created the dng_writer task before the reverse loop. Doesn't improve performance; but makes the code cleaner (removes the need for 'task_created').
Good to know
Quote
- I incremented rsleep while waiting for the write to finish - most of the time is spent here.
Makes sense, reversing the entire buffer is going to be faster than writing it out.
Quote
- Removing the limit on 'size' in dng_writer (to only write a max of DNG_CHUNK_SIZE bytes per loop) reduced the DNG time from 1.5 to 1.4 secs on the G1X.
G1X has two raw buffers, so this makes sense, because you don't have to wait around to re-reverse.

I actually had it this way at the start, but  limiting to chunk seems to help significantly on cameras with only one raw buffer. I think the reason goes like this:
first chunk is reversed (this is not in parallel with writing in my code)
writer starts writing
reversing finishes entire buffer before the first write is done.
writer sees buffer is done, so the next write does everything that's left. dng_write_ptr is left pointing at the end of the first chunk. wc = 2

unreverser reverses up to the end of the first chunk, can't advance until all the writing is finished.
Write is finished, advances dng_write_ptr to the end of the buffer, unreverser does most of it's work not in parallel with writing.

There should be better ways to handle this than fixed sized chunks.
Quote
- Using cached or uncached memory makes no difference to the save times (G1X). Is the call to 'dcache_clean_all' necessary? I commented it out and the DNG files still get saved correctly.
If the reversing is happening on the cached address and the writing is happening from the uncached address, then it is theoretically required. In practice though, the reversing is probably going to get ahead of the writing (and push the stuff that's being written out of cache) so it would probably work OK anyway. If reversing is done in uncached memory, then it shouldn't matter.

Quote
All my cameras also take longer to save the first RAW/DNG file after power up; but this is the case with the existing trunk / release-1.1 code as well.
Agreed, this was a general observation, not specific to the dng task stuff.
Don't forget what the H stands for.

*

Offline reyalp

  • ******
  • 12737
Re: Writing DNG in parallel with reversing
« Reply #7 on: 18 / May / 2013, 23:49:09 »
I created a branch for this https://tools.assembla.com/svn/chdk/branches/dng-async-write

New version checked in
- Writes header in dng_write task, per philmoz suggestion
- For cameras with multiple raw buffers, writes as much data as is ready
- For cameras with only one raw buffer, try to leave DNG_END_MARGIN to overlap with dereversing
- Doesn't clean dcache if not using cached raw
- Use msleep(0) immediately after creating the task to let it run. I verified that this does in fact start the task (if it's higher priority), and calling msleep(0) does not wait 10 ms. This sometimes allows the reversing to start before the thumbs are finished.

On d10, this gives the best performance so far (~3180 ms)

On a540, it seems a bit worse than the work-3 version (1400 ms vs 1300).

Some comments on the debug counters
wsleep (ws in misc debug)
counts how many times the file writer task waited. This happens if there isn't reversed data past the written data. This can happen if writing the header finishes before the first chunk is completed. Lower is better, since writing is the most time consuming part, you don't want this task waiting around.

wcount (wc)
how many chunks were written. Generally, lower is better. Optimal will probably be 2 on cameras with multiple raw buffers, or 3 on cameras with only one.

rsleep (rs)
how many times the main task waited for the write to complete (in the de-reversing process for cams with only one raw buffer, or waiting at the end).

adjustable parameters
DNG_CHUNK_SIZE
how big a chunk the reversing is done in. Smaller values mean the writer task can get to work sooner, but may cause it to do more writes

DNG_END_MARGIN
This only applies to cameras with one raw buffer, and is meant to avoid the situation I described in my previous post. This defines the minimum amount of the buffer that will have to be de-reversed after all writes are completed.


It's not clear to me how to optimize this for different cameras.
Don't forget what the H stands for.


*

Offline philmoz

  • *****
  • 3179
    • Photos
Re: Writing DNG in parallel with reversing
« Reply #8 on: 19 / May / 2013, 00:03:34 »
I created a branch for this https://tools.assembla.com/svn/chdk/branches/dng-async-write

New version checked in
- Writes header in dng_write task, per philmoz suggestion
- For cameras with multiple raw buffers, writes as much data as is ready
- For cameras with only one raw buffer, try to leave DNG_END_MARGIN to overlap with dereversing
- Doesn't clean dcache if not using cached raw
- Use msleep(0) immediately after creating the task to let it run. I verified that this does in fact start the task (if it's higher priority), and calling msleep(0) does not wait 10 ms. This sometimes allows the reversing to start before the thumbs are finished.

On d10, this gives the best performance so far (~3180 ms)

On a540, it seems a bit worse than the work-3 version (1400 ms vs 1300).

This version reduces the G1X DNG write time to 1.3secs.

Quote
Some comments on the debug counters
wsleep (ws in misc debug)
counts how many times the file writer task waited. This happens if there isn't reversed data past the written data. This can happen if writing the header finishes before the first chunk is completed. Lower is better, since writing is the most time consuming part, you don't want this task waiting around.

wcount (wc)
how many chunks were written. Generally, lower is better. Optimal will probably be 2 on cameras with multiple raw buffers, or 3 on cameras with only one.

rsleep (rs)
how many times the main task waited for the write to complete (in the de-reversing process for cams with only one raw buffer, or waiting at the end).

adjustable parameters
DNG_CHUNK_SIZE
how big a chunk the reversing is done in. Smaller values mean the writer task can get to work sooner, but may cause it to do more writes

DNG_END_MARGIN
This only applies to cameras with one raw buffer, and is meant to avoid the situation I described in my previous post. This defines the minimum amount of the buffer that will have to be de-reversed after all writes are completed.


It's not clear to me how to optimize this for different cameras.

Put the default values in camera.h so they can be overriden in platform_camera.h - and store the runtime values in the camera_info structure.

Phil.
CHDK ports:
  sx30is (1.00c, 1.00h, 1.00l, 1.00n & 1.00p)
  g12 (1.00c, 1.00e, 1.00f & 1.00g)
  sx130is (1.01d & 1.01f)
  ixus310hs (1.00a & 1.01a)
  sx40hs (1.00d, 1.00g & 1.00i)
  g1x (1.00e, 1.00f & 1.00g)
  g5x (1.00c, 1.01a, 1.01b)

*

Offline reyalp

  • ******
  • 12737
Re: Writing DNG in parallel with reversing
« Reply #9 on: 19 / May / 2013, 21:00:47 »
To get better insight into what the code is actually doing and how the various variables affect it, I wrote a test script for use with the current code (r 2796) in the branch.

This will run a series of shots and record various statistics to a file called A/dngtest.csv

The default is to take 10 shots with the old dng saving method, to get a baseline. It will then take 10 with the new method and the default values for the chunk sizes.

If you set either of the "step" values, it will double the corresponding chunk size with each step. Each "end chunk" size is tried for each "rev chunk" size, so the total number of shots will be

<old code shots> + <shots per test>*<rev steps>*<end steps>

Needless to say, this can get large and fill up your card quite quickly.

I would be interested to see what the results are with a sample of cameras. If you run it, please also state your SD card speed rating.


I am still getting hangs on my a540. I would be curious to know if anyone else gets hangs on vxworks cameras. I can provide test builds if desired, just give your model and firmware version.
Don't forget what the H stands for.

 

Related Topics