The main issue with motion detection speed was that the current buffer was unknown, so both the best-case and worst-case scenario have 1/3 chance of happening. This effectively gave me a 65-160 (give or take) millisecond lag between the first event and taking the picture. I mainly measures this with my Speedlite 580 EX II, by lack of a better tool. It has the option to fire multiple flashes in a given frequency. I hope I may assume a 350-euro flash is accurate enough
Test setup: Powershot S5IS 1.01b, manual mode 1/1000 F/2.7, Tv override 0.4" (to capture the second flash but keep the display dark), motion detection set to immediately shoot (in MD, not in the script), Speedlite 580 EX II set to multi mode, 2 flashes, 4-20 Hz. I took several test pictures and I didn't seem to get many hits when set to 12 Hz, anything faster wouldn't register often. That worked out to the 65-160ms lag I was talking about earlier.
Anyway, now to get to the good bits. I've done some quick digging, I searched for the viewport address hoping to find some surrounding code suggesting something to indicate the current buffer. The first thing I found was a near-immediate hit. I found some code which does exactly what I want, but a live hexview showed me that the variables involved never changed. I did notice, however, the base address (those variables were based on) changed rapidly, indicating 0, 1 or 2. These were the numbers I was looking for so I put them in the motion detector. All tests 'misfired', I got nothing at the higher speeds and I only saw my flash dimming on some slower speeds. Maybe that variable was used to indicate the current buffer ready for drawing or... well, I don't know. Anyway, I subtracted 1 from that value (wrapping it to 2) and suddenly I got 4/5 shots with my flash 'dimming' at 15 Hz (never had any success there before). At 14 Hz, 3/4 were completely bright and one of them showed my flash dimming again. This quick test suggests that I just brought down the lag to 65-75 ms
These were only basic tests, though, and I haven't closely looked into the code and other possibilities yet. I don't have time to do that, either, which brings me to the bad news. I have some important exams coming up again and I still have to send in my camera for repair (sensor dust), so I decided now would be a good time to do that. Consequently, I won't have my camera for two weeks so I can't/won't code anything for it either. I'm hoping someone continues on what I just found out or you'll have to wait.. well, a couple of extra weeks
Here's the deal (S5IS 1.01b):
Converting a memory dump of the viewport + a few MB gives me an image of three instances of the viewport, which suggests three separate buffers.
I found an instance of 0x10D29360 (viewport) in sub_FF83EC98, at ROM:FF83F81C. The subroutine suggests it's got something to do with the live view (string "LiveImage.c"). Immediately below the viewport address, there's 0x7E9 (viewport buffer size shifted 8 bits to the right). Somewhere below is some code which loads a value from somewhere (base address in R7), multiplies 0x7E9 with that value and then adds the result, shifted 8 bits to the left, to the viewport buffer address, which is then stored somewhere. A look at the memory reveals that the multiplying value stays at 0 and the address doesn't change either (still 0x10D29360). The base address in R7 is assigned to R4 at the beginning of the function, it is 0x281C. I used this address as 'base' for my hex viewer, so I could see it change from 0 to 1 to 2 to 0 and so on... faster than the GUI refresh, though, so it looks kind of random.
In motion_detector.c, immediately after assigning the buffer (so line 383) I added the line
img += *((long *) 0x218C) * 0x7E900;
This didn't give me any good results, so I changed it to
long bufoff = *((long *) 0x218C);
if(bufoff == 0) {
bufoff = 2;
} else {
bufoff--;
}
img += bufoff * 0x7E900;
This seems to work brilliantly. I hope this is useful to someone.
Oh, by the way, I profiled the motion detector, it seems to finish well within 10 milliseconds (same system time, which is 10-ms-accurate), so that's not really the bottleneck anywhere
I guess the remaining 65-75 ms are because of
the 10-ms worst-case keyboard lag and the code required for shooting initialization.