Most time waste happens in reverting bytes order in RAW buffer, but I don't know how avoid this procedure or decrease its time (now 2.5 CPU instructions/byte).
Looking at the code I thought that reversing the bytes might be a bottleneck. (I hope you don't mind me correcting your English, reverting has a similar meaning to reverse, but it's not the correct term to use here). I spent quite some time reading through the DNG specification to see if it would be possible to avoid the byte-reversal. Unfortunately the spec is quite clear that for packed bytes they must be in big-endian order, even if the TIFF header specifies the file is in little-endian order.
I had two thoughts on this:
a) Writing the routine in assembly language will produce some quite good savings, especially since the ARM instruction set is good at byte manipulation. There is an instruction that specifically reverses byte order (REV16), see:
ARM Information CenterUsing the REV16 instruction could get it down to 4 CPU instructions per 4 bytes (LDR, REV16, STR, branch)
b) It would be useful to have an option not to reverse the bytes. That way people who wanted to could post-process the DNG file after downloading. But this is complicated by the need to somehow store the byte-order in the file (naturally there isn't a tag available for this). Probably something to do later on.
Martin