Your methods are working. I'll have to try my way as well. The sensor is so hot, that black level is significantly affected though. How does this log compare to your experiments?
@jmac698 Sorry for the slow response. I haven't actually looked at that script much since that thread in 2018, and most of the tests I did weren't directly comparable.
Anyway, I did some more tests and while I'm not entirely sure it will answer your questions, it maybe you'll find something useful.
I assume your script subtracts black level
Not for the "meter" value. This likely is the biggest confounding factor in your numbers above. It is simply the raw sensor value, so would need 127 subtracted (I mistakenly said 128 in an earlier post, but most 12 bit cams including yours and the ones I tested are hard coded to 127). The blacklevel used is recorded in the "desc" column.
The meter96 and m_96* values do have the black level removed in the conversion to APEX*96. I almost never look at the raw values with these kinds of scripts, because the meter96 is so much more convenient to relate to exposure. Of course, this assumes the conversion is valid, but experience suggests it's good enough.
With blacklevel subtracted, I get 4.2 for meter(f/4)/meter(f/8) in your data, or about 5% error, which doesn't seem unreasonable.
The predicted ratio is 2^(1/(2*96/32)), so 96/32 means 1/3 stop, and 2 is the ratio of 1 stop, so this is 2^1/6.
Maybe I'm misunderstanding or missing something, but if you mean the ratio between (raw - blacklevel) counts at 1/3 stop increments, shouldn't it just be 2^1/3 (= ~1.25992)?
Looking at your log, there's quite a lot of shot to shot variation. The "WARN: high error" message is triggered when the measured value varies more than 6 APEX*96 units (1/16th of a stop) from the predicted value. The threshold is arbitrary, but I believed I picked it because it wasn't common in the runs I did while developing the script.
I've attached a zip with .ods spreadsheets of a few of my runs, as well as yours. You should be able to load them in google docs if you don't use LibreOffice or similar.
The rows in the 'calcs' tab average the values for each ten shot step, as well as various ways I tried comparing them. IMO the most relevant are:
meter - bl and m_pred - bl are the average raw meter value and predicted meter value with black level subtracted
prev_frac is the ratio between the meter value, and the meter value of the preceding step
"1 stop frac" is similar to prev_frac, but for next full stop
m96_err is the average error (difference between the expected and actual meter96 value). Note for the first step, this just reflects variation in the baseline shots
In the three runs form my cameras, 2 are from sx710 and one is sx730. I think the sx710-avtest-20231022-3-calcs.ods run had some change in lighting after the baseline shot, but I didn't really run enough test to confirm that
These were done under artificial (compact florescent) lighting in a room at night with blinds drawn, facing a white wall.
Yours has a bit more variation, but it seems broadly similar.
It's not obvious to me how much of the error is lighting variation, variation in how close the hardware gets to ideal values, or other factors.
NOTE, my runs were done with a newer version of the script (also included), and 10 shots for the baseline. I fudged the one from your CSV (avtest-jmac698-20231014-calcs.ods) by adding the missing columns (av96_cur2 is simply duplicated from av96_cur, m_pred reverses the conversion of m96_pred)
Some background on how the script works:
It takes the number of shots specified by "Baseline shots" and averages those to get the expected raw value for the current exposure setting. This also acts as the first step in the test.
The exposure is measured by averaging a 1200 pixel square in the center of the sensor, with 5 pixel steps in x and y. The step being odd means all the colors in the CFA are sampled.
For each step after the baseline, it changes the aperture and takes "Shots per step" shots. The m96_err column represents the difference (in APEX*96 units) between the actual and expected exposure.
Since the baseline is measured once at the start, any subsequent changes in lighting will affect all the following shots. It might make more sense to measure the relative difference between each step, or to do both comparisons.