I thought Phil's update to Lua had fixed the resume bug. However, I triggered the bug again tonight. It seems to happen as it gets dark and the shutter time increases. Here's what it printed:
ERROR: NULL error message
*** TERMINATED ***
Lapser: -9 3209878 0 0
The "Lapser" line is debug variables I defined. Only the first one is significant here.
lapser1 = -9
This is a lucky break, because the only time lapser1 is -9 is in luascript.c here:
// Wait for a button to be pressed and released (or the timeout to expire)
static int action_stack_AS_LUA_WAIT_CLICK()
{
lapser1=-9;
// Check key pressed or timeout
if ((get_tick_count() >= action_top(2)) || camera_info.state.kbd_last_clicked)
{
// If timed out set key state to "no_key", otherwise key pressed so set last checked time
if (!camera_info.state.kbd_last_clicked)
camera_info.state.kbd_last_clicked=0xFFFF;
else
camera_info.state.kbd_last_checked_time = camera_info.state.kbd_last_clicked_time;
lapser1=9;
action_pop_func(1);
return 1;
}
return 0;
}
So it looks like action_stack_AS_LUA_WAIT_CLICK() didn't return, or lapser1 would be 9, not -9. Interesting!
If this is true, it may be that the function, get_tick_count() never returned (EDIT or messed up the stack somehow?). I suspect that it's not thread safe. Here's the code for get_tick_count() in platform/generic/wrappers.c
long get_tick_count()
{
long t;
#if !CAM_DRYOS
_GetSystemTime(&t);
return t;
#else
return (int)_GetSystemTime(&t);
#endif
}
I added a delay in capt_seq_hook_set_nr() that includes this code:
tick=get_tick_count();
while(tick<shot_next)
{
msleep(10);
tick=get_tick_count();
}
My theory is that there's a hard interrupt during the call to get_tick_count() above, and then get_tick_count() is called in wait_click(10) and messes up the stack.
But it looks like I could change my wait loop to avoid get_tick_count() like this:
tick=get_tick_count();
if(tick<shot_next)
{
msleep(shot_next-tick);
tick=shot_next;
}
I was using msleep(10) so I cold abort the wait by setting shot_next to 0 when I exit the program, but I can minimize the calls to get_tick_count() if that's will fix the bug.
So this may not be a "lua yield" bug after all. It may be that get_tick_count() is called from different threads but isn't thread safe.
So if this is the cause of the bug, I may be able to trigger it more reliably by calling get_tick_count() in a tight loop many times, in both threads. I'll try it and see what happens.
====
[EDIT 2]
The SX260 triggered the same bug about an hour later:
ERROR: NULL error message
*** TERMINATED ***
Lapser: -9 3146878 0 0
The second number is a little different and looks like an address. Here's where "lapser2" comes from:
int lua_script_run(void)
{
//snip
Lres = lua_resume( Lt, top );
lapser2=Lres;
Here's are the values for lapser2:
SX50: 3209878 0x30FA96
SX260: 3146878 0x30047E