Some analysis from whims results on the freshly formatted card
- Results are very stable, no slow writes to confuse things.
- Average total time (save_time column) is 1427 for old code, 968 for new. A decent improvement.
- Each reverse of the buffer takes ~310, for 620 total (rev_time and derev_time)
- Average non-reversing time for old code (file open, badpixel, header + thumb creation, writing header, writing data, closing file) is 813.
- note badpixel isn't explicitly timed, but it happens between header creation (hdr_c_end) and thumbnail start (thm_c_start) if using DNG 1.2. I've been testing with DNG 1.3, so haven't really looked at badpixel. There might be room for improvement, but users who really want speed can use 1.3 and fix the badpixels later.
So we are about 150 ms off the theoretical optimum. Where are we losing it?
- Write time (total time spent in the write loop, including sleeps waiting for the reverser) is 710ms in the new code, compared to 665 to just write the buffer all at once. 20ms of that is spent sleeping, waiting for the reverser (write_wait). The rest may be overhead from multiple writes.
- Header create + backpixel + thumbnail all need to happen before reversing start, and take 118 ms. That only leaves ~700ms of parallelizable time.
- file open starts at 120 ms
- reversing starts at 130 ms, header writing at about the same time.
- file writing starts at 140 ms. This is too soon for the first chunk to be reversed (full buffer = 310ms, 18 chunks = ~17ms/chunk), so it waits.
- reversing takes 440ms. There is a 10 ms sleep per chunk which should make the total ~490, but I've noticed in other places the N*sleep(10) takes less than N*10 ms.
- de-reversing takes ~390 ms, there's no sleeps except when it's waiting for the write task (finish_wait, average 40ms). The final chunk (17ms) must happen after writing finishes.
- 440+390 = 830. closing adds ~20ms, bringing the write time to 730. So reversing is ~100ms too long. derev_end - close_end averages 92 ms, confirming this.
Conclusion:
On a camera with a reasonably fast card, smallish sensor and slow processor, there isn't actually "plenty" of time for reversing. Eliminating (some of?) the sleep in the reverse task would seem like the obvious thing to do, but I've tried a few variations of this and always got worse results.
In this situation, making the thumbnail creation parallel wouldn't actually help.