Shot Histogram Request - page 13 - CHDK Releases - CHDK Forum supplierdeeply

Shot Histogram Request

  • 467 Replies
  • 129013 Views
*

Offline reyalp

  • ******
  • 14082
Re: Shot Histogram Request
« Reply #120 on: 04 / March / 2013, 17:18:11 »
Advertisements
I switched to the new wait_click(10) in my loop waiting for get_shot_ready() instead of using 3 is_press() calls with "set", "menu", and "display". I was getting script interrupted errors once in awhile, but did not have one with the new wait_click until last night. Instead of a "set" error, I got a "NULL" error with the sx260.
Was the error "ERROR:NULL error message" ?

This certainly seems like our old friend http://chdk.setepontos.com/index.php?topic=8273.0

There are relatively few ways a Lua script can end. If you look at the patch I posted in http://chdk.setepontos.com/index.php?topic=8273.msg87882#msg87882 it covers the area in question (note this patch will probably not work as-is with the current trunk)

As I posted in http://chdk.setepontos.com/index.php?topic=8273.msg87915#msg87915 the results of this were very strange and I couldn't really make sense of it. I don't completely remember what all I'd figured out from this, I'll have go dig around and see if what I can get out of the notes and dumps.

Possibly making a complete memory dump at the time the abnormal termination happens would shed some light on it, though it could be challenging to interpret.
Don't forget what the H stands for.

*

Offline lapser

  • *****
  • 1093
Re: Shot Histogram Request
« Reply #121 on: 08 / March / 2013, 14:45:22 »
Was the error "ERROR:NULL error message" ?
Yes, that's the one. I've been able to reproduce the bug once a night on the sx260, and I'm tracking it down slowly but surely. The bug happens when lua calls wait_click(10) or sleep(10). I set a variable to a different number in the C code for each function that my script calls, and sleep(10) is the culprit with this in the script:

repeat sleep(10) until get_shot_ready()

I did a lot of investigating of the keyboard functions. They aren't totally thread safe, but I couldn't show that was causing a problem.

I also dug into lua as far as I could get, and came up with a possibility. The sleep(10) function calls   lua_yield( L, 0 ).   lua_count_hook () also calls lua_yield(), which creates the possibility of a double yield with no resume. There's a test for that, but it might be fooled under rare timing conditions (I think?).

Anyway, scripts have a voluntary yield when they call sleep(10), and a preemptive yield from lua_count_hook(). My script doesn't need the preemptive yield, since it spends most of its time in a sleep(10) loop. So I'm trying disabling the preemptive yield with:

set_yield(-1,-1)

The script works fine this way. In fact, this reduces the time waiting for the script to set the exposure between shots to 10 msec each time, instead of 10 or 20 msec. I'll keep it in the script even if it doesn't fix the lua error bug. But I've got my fingers crossed that this is the problem, and I won't see that error again!

Last Wednesday, I caught a pretty nice sunset time lapse from the top of Spencer's Butte with the SX260, but the lua error stopped it while the clouds were still lit up. Here's the latest tests:

http://www.youtube.com/watch?v=6DWLPU0y4Ig#ws

The G1X doesn't seem to trigger the bug (yet), and I only saw it once with the SX50. Here's the latest test with these two cameras:

http://www.youtube.com/watch?v=HA7fh8viei0#ws

http://www.youtube.com/watch?v=hkBVi7MKB78#ws
« Last Edit: 08 / March / 2013, 18:43:06 by lapser »
EOS-M3_120f / SX50_100b / SX260_101a / G1X_100g / D20_100b
https://www.youtube.com/user/DrLapser/videos

*

Offline lapser

  • *****
  • 1093
set_yield(-1,-1) didn't work
« Reply #122 on: 08 / March / 2013, 22:42:57 »
Tonight's test with set_yield(-1,-1), didn't work. That is, the SX260 had one script interrupted error, and the G1X had 2.

So the next step is to take the sleep(10) out of the script wait loop, and use the  lua_count_hook () function to do the yield every 10 ms. That is:

repeat until get_shot_ready()

I shot over 5,000 pictures of the tv screen this afternoon with the sx260, and no error. The errors all have happened as it gradually gets darker, and the shutter open time gets to a certain point, longer than 10 msec. So maybe I'll try changing the script so it changes tv96 by 1 each time, and see if I can trigger the error more often.
 
EOS-M3_120f / SX50_100b / SX260_101a / G1X_100g / D20_100b
https://www.youtube.com/user/DrLapser/videos

*

Offline lapser

  • *****
  • 1093
Script interrupted solution?
« Reply #123 on: 09 / March / 2013, 08:34:27 »
I thought about it some more, and eliminating the sleep(10) from the wait loop is a bad idea. It would hog the cpu, and still probably not solve the problem.

But the solution is relatively simple: add this function:

wait_shot_ready()

The bug occurs when lua is in a yield / resume loop. Something about the timing causes an error with resume. wait_shot_ready() uses only one yield at the beginning, and a single resume at the end.  The timing of the resume is well after the shutter open/close cycle is complete, so that shouldn't cause a problem. I think it's going to work, but we'll see tonight.

Here are the time lapse videos from last night (with script interrupt/restart glitches)

http://www.youtube.com/watch?v=G_Jspt9xvH4#ws

http://www.youtube.com/watch?v=oFl-5baRxcw#ws
EOS-M3_120f / SX50_100b / SX260_101a / G1X_100g / D20_100b
https://www.youtube.com/user/DrLapser/videos


*

Offline lapser

  • *****
  • 1093
Re: Shot Histogram Request
« Reply #124 on: 09 / March / 2013, 14:07:09 »
wait_shot_ready() is done and working.

I also changed set_backlight(..) so it saves its state in "bldark". Then in the action stack AS_... function for sleep(), wait_click(), and wait_shot_ready(), I added:

  if (bldark) TurnOffBackLight();

I also added code to turn it back on when the script ends or errors out. All this makes set_backlight(0) almost work, except for the flashing between shots. It still should save battery power.

So I'm all set for another test tonight, with high hopes there won't be any script interrupted errors.

EOS-M3_120f / SX50_100b / SX260_101a / G1X_100g / D20_100b
https://www.youtube.com/user/DrLapser/videos

*

Offline reyalp

  • ******
  • 14082
Re: Shot Histogram Request
« Reply #125 on: 09 / March / 2013, 15:39:50 »
The bug occurs when lua is in a yield / resume loop. Something about the timing causes an error with resume.
You know this how?

The fact that you get "ERROR:NULL error message" means Lua is terminating "normally", in the sense that lua_resume is returning based on an error status from resume_error in lib/lua/ldo.c.

If you look at the patch I linked earlier, the case being hit was the one that generated 'A/LUARES1.DMP'

The bizarre thing is that luaD_rawrunprotected doesn't look like it is creating an error status.
Don't forget what the H stands for.

*

Offline lapser

  • *****
  • 1093
Re: Shot Histogram Request
« Reply #126 on: 09 / March / 2013, 16:51:42 »
The bug occurs when lua is in a yield / resume loop. Something about the timing causes an error with resume.
You know this how?
I set a variable, "lapser1", to a different value in every luascript.c function that my script calls. The error only occurs in the sleep() function that is called repeatedly waiting for get_shot_ready() to signal that the shot histogram is done.

I can't seem to trigger the bug except when doing a time lapse, as the shutter time is slowly increased. The fastest shutter speed it has happened at is 10 msec.

repeat sleep(10) until get_shot_ready()   yields and resumes around 60 times per shot with a 1 second interval.
wait_shot_ready() yields only once per shot. Since the bug happens after a resume, I'm hoping that eliminating all but one resume will prevent it.
Quote
The bizarre thing is that luaD_rawrunprotected doesn't look like it is creating an error status.
Could this happen with a hardware interrupt inside an internal lua function that isn't thread safe?

Do you know how the shutter open/close is timed? Maybe there's an extra interrupt timer that generates an extra interrupt when its time to close the shutter? Since there's a firmware function to open and close the mechanical shutter, the timing might be controlled by the camera processor instead of a hardware timer. I'm just shooting in the dark here, but you know me and my theories.

If I trigger the bug again, can you describe how to do a memory dump?

Thanks for the help.
EOS-M3_120f / SX50_100b / SX260_101a / G1X_100g / D20_100b
https://www.youtube.com/user/DrLapser/videos

*

Offline reyalp

  • ******
  • 14082
Re: Shot Histogram Request
« Reply #127 on: 09 / March / 2013, 17:29:43 »
Could this happen with a hardware interrupt inside an internal lua function that isn't thread safe?
A hardware interrupt should be completely transparent to Lua, just like any other vanilla C code. I don't see how thread safety wouldn't come into it, the Lua code is single threaded since it executes entirely in kbd_task.

One possible area that might be suspect is the implementation of setjmp/longjmp (which are custom, in lib/lua/setjmp.S), but I couldn't see any way this would be affected.

The combination of the debug hook yield and sleep yield also seems like it might be worth further investigation. One way to do this might be to make the debug hook yield more frequently and see if the problem happens more often.

If this problem is in fact the problem encountered earlier, it is not strictly tied to shooting. The fact that you get the NULL error message error suggests that it is the same, or at least closely related.
Quote
Do you know how the shutter open/close is timed?
No. Interrupts are probably involved, see http://chdk.setepontos.com/index.php?topic=8312.msg87949#msg87949

Quote
Maybe there's an extra interrupt timer that generates an extra interrupt when its time to close the shutter?
I don't still don't see how interrupts would be involved. But then, I don't have a good theory of how this happens... or a bad one, for that matter :-[

Quote
If I trigger the bug again, can you describe how to do a memory dump?
If you want to dump the entire camera RAM, see main.c dump_memory() (used by the debug menu memory dump option)

I would suggest putting it as close as possible to where the Lua error state is triggered, probably something like
Code: [Select]
   status = luaD_rawrunprotected(L, resume, L->top - nargs);
   if (status != 0) {  /* error? */
+    if(status != LUA_YIELD) {
+       dump_memory();
+    }
     L->status = cast_byte(status);  /* mark thread as `dead' */

in ldo.c lua_resume

Note that doing something useful with this will be decidedly non-trivial. It will also dump memory on any Lua error...
Don't forget what the H stands for.


*

Offline lapser

  • *****
  • 1093
Re: Shot Histogram Request
« Reply #128 on: 09 / March / 2013, 19:16:24 »
The combination of the debug hook yield and sleep yield also seems like it might be worth further investigation. One way to do this might be to make the debug hook yield more frequently and see if the problem happens more often.
Except the bug still happened with:
set_yield(-1,-1)

My current script keeps the yield hook disabled, and there's only 1 yield per shot at the beginning of my new wait_shot_ready().

If the bug happens again, assuming I can restrain the urge to throw my camera against the wall, I'll look more at the gui_handler stuff. Is that called from the keyboard task? I'm wondering if script_gui_handler could be generating a spurious shoot_full click that's interrupting the script when it's not in the right state or something. If get_clicked_key() or similar is called outside the keyboard task, that could cause trouble.
Quote
If this problem is in fact the problem encountered earlier, it is not strictly tied to shooting. The fact that you get the NULL error message error suggests that it is the same, or at least closely related.
Actually, the null error message was when I was using wait_click(10) for my delay. With sleep(10), the error message is the last string passed to is_key().

With is_key("set") before the sleep(10), the error was "set".
With is_key("up") the error is "up".

I also got an error of "0" on one camera with wait_click(10), instead of "NULL".

I think it's using what was on the stack before the yield as the error. It sounds like a stack error of some kind.
EOS-M3_120f / SX50_100b / SX260_101a / G1X_100g / D20_100b
https://www.youtube.com/user/DrLapser/videos

*

Offline reyalp

  • ******
  • 14082
Re: Shot Histogram Request
« Reply #129 on: 09 / March / 2013, 19:49:42 »
If the bug happens again, assuming I can restrain the urge to throw my camera against the wall, I'll look more at the gui_handler stuff. Is that called from the keyboard task?
Yes, it should be.
Quote
I'm wondering if script_gui_handler could be generating a spurious shoot_full click that's interrupting the script when it's not in the right state or something. If get_clicked_key() or similar is called outside the keyboard task, that could cause trouble.
If the script were somehow being interrupted by a spurious shoot click, then you wouldn't get the bogus or null error messages. It would just act like you had pressed the shutter. When the shutter is pressed, it triggers kbd_task to kill the Lua environment *outside* of the Lua code (meaning lua has already returned from lua_resume, and probably waited for the next iteration of kbd_task)

Quote
I think it's using what was on the stack before the yield as the error. It sounds like a stack error of some kind.
The error comes form the lua state stack, not the C stack. Lua is dying in some way that returns an error (edit: I mean, a code path that would normally be an error), but fails to put an error on the (edit: Lua) stack.
« Last Edit: 09 / March / 2013, 19:51:44 by reyalp »
Don't forget what the H stands for.

 

Related Topics