As this is RAW based, can assume the resultant histogram is linear, ie a gamma of 1.
It is generally assumed to be minimally modified raw counts from the sensor, before debayering or any color calibration. Of course, it's all undocumented so assumptions are at your own risk, but the fact CHDK DNG works as well as it does suggests this understanding is generally correct.
I think I understand the xstep and ystep, but could you confirm that if, as you have, you use an odd number, ie 15, that means you will average the exposure over a 15x15 sensel area of the CFA.
No. It means sample every 15th raw pixel (which again, are before debayering, which I guess you are calling sensel) in X and Y. FWIW, shot_histogram / get_histo_range uses the equivalent of 31.
Plus, do I assume I can only use the raw histo after taking an image
Yes, as documented in the rawop page, the raw buffer can only be read inside the raw hook, after a still shot is taken. This is why histo:update() is called after hook_raw.wait_ready(). As I've mentioned several times, raw sensor data is simply not available to CHDK (and almost certainly doesn't exist outside the sensor itself) except when a still shot is being processed.
get_histo_range is based on the data, so it has the same limitation, it just does the equivalent of histo:update for you automatically.