Shot Histogram Request - page 25 - CHDK Releases - CHDK Forum  

Shot Histogram Request

  • 467 Replies
  • 129027 Views
*

Offline lapser

  • *****
  • 1093
Re: Shot Histogram Request
« Reply #240 on: 08 / April / 2013, 02:52:33 »
Advertisements
One more thing I've been wondering about. Do you think it's possible that resume() could occur in the same action stack cycle after the yield()? Would that cause a problem if it did?
EOS-M3_120f / SX50_100b / SX260_101a / G1X_100g / D20_100b
https://www.youtube.com/user/DrLapser/videos

*

Offline reyalp

  • ******
  • 14082
Re: Shot Histogram Request
« Reply #241 on: 08 / April / 2013, 03:12:23 »
Do you think it's possible that resume() could occur in the same action stack cycle after the yield()?
Off the top of my head, I can't think of a way that would occur in the current chdk code. Each kbd_task iteration should only call lua_script_run once.
Quote
Would that cause a problem if it did?
It shouldn't matter at all. Lua yield and resume are really just regular C function, they don't know anything about OS level tasks.

Thank you for persisting with your investigation of this BTW. It's very frustrating to have such a serious bug and not be able to reproduce it.
Don't forget what the H stands for.

*

Offline lapser

  • *****
  • 1093
Lua bug solved (again!)
« Reply #242 on: 09 / April / 2013, 13:03:14 »
OK, I really think I got it this time . :D

First of all, I think the Lua error was triggered by CHDK calling resume() without a yield(). I think this happened because the action stack cleared abnormally before action_stack_AS_LUA_WAIT_CLICK() was finished. Here's the relevant code:

Code: [Select]
static int action_stack_AS_LUA_WAIT_CLICK()
{
lapser1=-9;
    // Check key pressed or timeout
    if ((get_tick_count() >= action_top(2)) || camera_info.state.kbd_last_clicked)
    {
        // If timed out set key state to "no_key", otherwise key pressed so set last checked time
        if (!camera_info.state.kbd_last_clicked)
            camera_info.state.kbd_last_clicked=0xFFFF;
        else
            camera_info.state.kbd_last_checked_time = camera_info.state.kbd_last_clicked_time;
lapser1=9;
        action_pop_func(1);
        return 1;
    }

    return 0;
}

static int luaCB_wait_click( lua_State* L )
{
lapser1=8;
    int delay = luaL_optnumber( L, 1, 0 );
    //if (delay == 0) delay = -1;
    if(delay<2)delay--; //delay 1 for no delay or 0 for no timeout 
    delay = sleep_delay(delay);
    // Reset 'clicked' key if it has not changed since last time
    if (camera_info.state.kbd_last_clicked_time <= camera_info.state.kbd_last_checked_time)
    {
        camera_info.state.kbd_last_clicked = 0;
    }
    // Set up for wait or click testing
    action_push(delay);
    action_push_func(action_stack_AS_LUA_WAIT_CLICK);

    // Check for short delay or key already pressed by calling action stack routine once now
    if (action_stack_AS_LUA_WAIT_CLICK() == 0)
    {
        return lua_yield( L, 0 );
    }
    //return 1;
    return 0;
}

static int luaCB_is_key( lua_State* L )
{
lapser1=7;
  lua_pushboolean( L, script_key_is_clicked(lua_get_key_arg( L, 1 )));
  return 1;
}

  repeat wait_click(10)
    if(is_key("set"))then
[snip series of is_key calls]
  until get_shot_ready()

[Lua error was exactly the same as yesterday on the same camera, sx50)
ERROR: NULL error message
*** TERMINATED ***
Lapser:     -9 3209878     0     0

So the only way lapser1=-9 can happen is if  action_stack_AS_LUA_WAIT_CLICK() returns "1". In that case, luaCB_wait_click( lua_State* L ) calls  lua_yield(). Lua should not resume until  lapser1=9, but it did. This means the action stack ended abnormally.

If wait_click(10) had returned, the script calls is_key("set"), and lapser1 is set to 7. This is what happened before the lua update, and the error message was "set". The Lua update added a check having to do with stack overflow (I think), so maybe it no longer would resume() without a yield like it did before, and terminated the script in wait_click(10). Hence, lapser1=-9.

So the thing to figure out is how action stack could think it's finished. It happens very rarely, but the script is gradually changing the shutter time, which may be triggering an interrupt at gradually changing times from the last tick. So if another task is calling the same action stack routine as the keyboard task, you'd have a thread safety issue that might be the cause.

So do you think this could be happening? Is action stack called from a different task?
EOS-M3_120f / SX50_100b / SX260_101a / G1X_100g / D20_100b
https://www.youtube.com/user/DrLapser/videos

*

Offline ahull

  • *****
  • 634
Re: Shot Histogram Request
« Reply #243 on: 09 / April / 2013, 14:36:36 »
Sure. Give me the models and firmware versions and I'll post builds for you.

I've attached the latest script and ixus40 build
If you have merged back in to the "bleeding edge" build, does that mean I can try you script out on a couple of other models now?

That version of the script crashes on the ixus 40 at startup with :1: unexpected symbol near '0'
*** TERMINATED ***
Lapser:   0 0 0 0

I suspect I have failed to copy over some files.. I'll check.

« Last Edit: 09 / April / 2013, 14:39:30 by ahull »


*

Offline lapser

  • *****
  • 1093
Re: Shot Histogram Request
« Reply #244 on: 09 / April / 2013, 15:08:26 »
I suspect I have failed to copy over some files.. I'll check.
Maybe. Be sure to copy all the files in the zip. Lua is in a module, which changes with each update.
EOS-M3_120f / SX50_100b / SX260_101a / G1X_100g / D20_100b
https://www.youtube.com/user/DrLapser/videos

*

Offline reyalp

  • ******
  • 14082
Re: Shot Histogram Request
« Reply #245 on: 09 / April / 2013, 16:47:17 »
First of all, I think the Lua error was triggered by CHDK calling resume() without a yield().
This should not be possible. It would mean either that multiple tasks were running the action stack, or that something within a lua C function called lua_script_run.

However, you could test what happens if you call resume without a yield by adding code to force that.
Quote
I think this happened because the action stack cleared abnormally before action_stack_AS_LUA_WAIT_CLICK() was finished.
I don't understand how this could happen. kbd_task isn't going to run another iteration until the call returns, which means that lua has yielded. The entire kbd_task including all the lua code and action stack functions run sequentially. Once lua has been entered, action stack processing does not happen again until lua has yielded or exited. The only way to get back into lua is if action_stack_AS_SCRIPT_RUN is on top of the stack.

Perhaps I don't understand what you are trying to say.
Quote
So do you think this could be happening?
No.
Quote
Is action stack called from a different task?
Not unless something is very badly wrong, and if it did, it would lead to near instant corruption of the lua state, which would very quickly cause crashes and other bad behavior. The exact nature of the bad behavior would be highly dependent on timing and memory organization, so it would be exceedingly unlikely to show the exact same symptom across different builds, cameras and scripts.

Looking at your debug code:
If action_stack_AS_LUA_WAIT_CLICK() returns 0, then you can yield with labser1 == -9

lua_yield will eventually make lua_resume return, so it's quite possible that the error occurs after that yield.

You could add new lapser1 value right before the return 0 to check for this.
Don't forget what the H stands for.

*

Offline lapser

  • *****
  • 1093
Re: Shot Histogram Request
« Reply #246 on: 09 / April / 2013, 17:28:11 »
However, you could test what happens if you call resume without a yield by adding code to force that.
I was trying to do that through action stack, but just crashed the camera.
Quote
Looking at your debug code:
If action_stack_AS_LUA_WAIT_CLICK() returns 0, then you can yield with labser1 == -9
lua_yield will eventually make lua_resume return, so it's quite possible that the error occurs after that yield.
Yes, I know it's calling lua_yield() with lapser1==-9. But that also means that AS_LUA_WAIT_CLICK() is still on the action stack. It should continue executing that function until AS_LUA_WAIT_CLICK() times out (next cycle), and calls POP_FUNC. But it also sets lapser1=9 when it does that.

So the question is, how does it get to lua_resume() without lapser1 being set to 9? Forget all my other theories. This is the problem I can't figure out.

I also don't understand why there are multiple action stacks, and process_all(). It looks like action_stack_AS_SCRIPT_RUN() is in a different action stack? This is what calls lua_script_run() which calls lua_resume() and throws the error. But if lapser1==-9, then AS_LUA_WAIT_CLICK() should still be on the "active stack." So maybe the active stack got changed when it shouldn't have?

This is all an AS_HEADACHE(), but I hope we're figuring it out.
EOS-M3_120f / SX50_100b / SX260_101a / G1X_100g / D20_100b
https://www.youtube.com/user/DrLapser/videos

*

Offline philmoz

  • *****
  • 3450
    • Photos
Re: Shot Histogram Request
« Reply #247 on: 09 / April / 2013, 17:38:43 »
So the question is, how does it get to lua_resume() without lapser1 being set to 9? Forget all my other theories. This is the problem I can't figure out.

It doesn't 'get' to lua_resume - the call to 'luaCB_wait_click' was made from within a previous call to 'lua_resume' from the last time Lua regained control.

When luaCB_wait_click calls lua_yield, it causes control to return back out of the previous lua_resume call - at this point it is getting the bad status value in lua_script_run.

So sometime after Lua resumed executing the script code, up to the point where luaCB_wait_click calls lua_yield, something corrupted the lua state.

It's not necessarily even related to luaCB_wait_click - it could have been any of script statements run within the lua timeslice that caused the problem, or a bug in Lua itself.

Given the random nature of the errors I wonder if it's a problem with the Lua garbage collector.

Phil.
CHDK ports:
  sx30is (1.00c, 1.00h, 1.00l, 1.00n & 1.00p)
  g12 (1.00c, 1.00e, 1.00f & 1.00g)
  sx130is (1.01d & 1.01f)
  ixus310hs (1.00a & 1.01a)
  sx40hs (1.00d, 1.00g & 1.00i)
  g1x (1.00e, 1.00f & 1.00g)
  g5x (1.00c, 1.01a, 1.01b)
  g7x2 (1.01a, 1.01b, 1.10b)


*

Offline lapser

  • *****
  • 1093
Re: Shot Histogram Request
« Reply #248 on: 09 / April / 2013, 18:44:47 »
It doesn't 'get' to lua_resume - the call to 'luaCB_wait_click' was made from within a previous call to 'lua_resume' from the last time Lua regained control.

Given the random nature of the errors I wonder if it's a problem with the Lua garbage collector.
OH, I see it now (finally). When wait_click() calls lua_yield(),  lua then RETURNS from resume() (with lapser1==-9), and then CHDK enters action stack. When it finishes action stack, it then calls resume() (with lapser1==9).

So lapser1==-9 just means wait_click yielded, instead of returning.

Since I don't get the bug when waiting with wait_shot_ready(), I'm assuming my code isn't trashing the Lua state. I like your garbage collection suggestion. I know it's incremental in Lua, so it could be happening any time.

Since lapser1==-9 only happened after the update, and the update added a Lua stack overflow check, I wonder if it's stack related. Maybe garbage collection uses a lot of extra stack?

Is there a separate Lua stack? Is there any way to increase its size?

[EDIT] I think I just answered my own question. There is a Lua stack. I've increased it, and I'll try another time lapse test tonight to see if that fixes the Lua yield bug. See this post for more details:

http://chdk.setepontos.com/index.php?topic=9607.msg99095#msg99095
« Last Edit: 09 / April / 2013, 20:15:50 by lapser »
EOS-M3_120f / SX50_100b / SX260_101a / G1X_100g / D20_100b
https://www.youtube.com/user/DrLapser/videos

*

Offline lapser

  • *****
  • 1093
Re: Shot Histogram Request
« Reply #249 on: 09 / April / 2013, 20:23:20 »
This is the time lapse from last night with the G1X (which didn't trigger the yield bug). I think my exposure smoothing routine worked really well. I need to generalize it so it will work with intervals larger than 1 shot per second, though. Rather than try to add the smoothing to CHDK so I can smooth before rounding, I think I'll add a scale option to get_shot_meter() so it can return tv96*1000, for more accuracy in computing the smoothing.

Details about the video are in the description on YouTube
http://www.youtube.com/watch?v=qhkVtBtY3t0#ws
EOS-M3_120f / SX50_100b / SX260_101a / G1X_100g / D20_100b
https://www.youtube.com/user/DrLapser/videos

 

Related Topics