help: how to debug script interrupting? - page 19 - General Discussion and Assistance - CHDK Forum
supplierdeeply

help: how to debug script interrupting?

  • 224 Replies
  • 43464 Views
*

Offline reyalp

  • ******
  • 13619
Re: help: how to debug script interrupting?
« Reply #180 on: 04 / May / 2013, 19:34:14 »
Advertisements
I ran 10k noshoots with an un-patched trunk build from my build environment. No errors.
Don't forget what the H stands for.

*

Offline srsa_4c

  • ******
  • 4437
Re: help: how to debug script interrupting?
« Reply #181 on: 04 / May / 2013, 20:42:37 »
However, I got an error in 850 cycles on the ixus110_sd960 (r31, same generation as the d10). Perhaps we should collect the affected and (presumably) not affected models somewhere, maybe there's a pattern.
What source branch / and version was this? Did it include my debug patch ?
It was remote capture test v3, 2714 + patch (I haven't checked whether your patch relied on some newer changes), then it was locally updated to 2751 (don't remember when I did the update).

*

Offline reyalp

  • ******
  • 13619
Re: help: how to debug script interrupting?
« Reply #182 on: 04 / May / 2013, 22:12:14 »
So I thought I'd get back to basics and run the patch that reliably reproduced the problem

Currently at 6160 noshoots without hitting it once.  :blink:

Previously, running identical settings (as far as I know), it hit 8 times in less than 5k.

The only difference I know for sure is the the ambient temperature is quite a bit lower.  :haha

edit:
completed 10k without any errors

This is debug patch on latest trunk, I could try going to one of the exact trunk revs I used before, but the changes seem trivial compared to the changes I made in different versions of the debug code, not to mention the various versions outslider and lapser used....

edit:
in fact, the builds are pretty much identical, since the last one I repro'd with (using msgtest) was 2753
« Last Edit: 05 / May / 2013, 00:43:47 by reyalp »
Don't forget what the H stands for.

Re: help: how to debug script interrupting?
« Reply #183 on: 05 / May / 2013, 00:06:34 »
Here's a g10 102a build with my debug patch.
Ran 1,000,000 iterations for the chdkptp test with this.

Misc Values Display :

MEM:   0xffffffff
B:       0
ZB:     0
USB:   0



« Last Edit: 05 / May / 2013, 00:12:42 by waterwingz »
Ported :   A1200    SD940   G10    Powershot N    G16


*

Offline reyalp

  • ******
  • 13619
Re: help: how to debug script interrupting?
« Reply #184 on: 05 / May / 2013, 02:13:45 »
edit:
completed 10k without any errors
Running the same build triggered error again with at ~500, using all the same settings except actually shooting. Subsequently triggered twice more using msgtest, at less than 30k iterations.
Don't forget what the H stands for.

*

Offline srsa_4c

  • ******
  • 4437
Re: help: how to debug script interrupting?
« Reply #185 on: 05 / May / 2013, 10:43:34 »
- trying CHDK compiled as ARM
We can strike this from the list. Thanks to Phil's changes, I was able to try ARM CHDK on the ixus110, and the test still managed to trigger the error.

*

Offline reyalp

  • ******
  • 13619
Re: help: how to debug script interrupting?
« Reply #186 on: 05 / May / 2013, 14:59:59 »
- trying CHDK compiled as ARM
We can strike this from the list. Thanks to Phil's changes, I was able to try ARM CHDK on the ixus110, and the test still managed to trigger the error.
Thanks for testing that. What script did you use to test?

My main conclusion from my tests yesterday is that it's very difficult to decide conclusively whether a particular camera or build is affected. The same build can trigger once very few hundred iterations, or not trigger for 10k.

While it's probably just seeing patterns in noise, the idea that it's correlated to temperature is still bugging me. Actually shooting heats things up quite a bit more than the half shoot. The errors seem to come much faster when the CCD temp is at 50c +. The times it triggered rapidly using the noshoot script, the ambient temp was a lot higher (~30c). Crazy? Probably  :-[
Don't forget what the H stands for.

Re: help: how to debug script interrupting?
« Reply #187 on: 05 / May / 2013, 16:50:30 »
That would explain why I usually get errors after some time of shooting, usually not at start, when cam is cold.

Once I was shooting outside at night, I supose the temp was around ~15 or below. There was raining and the cam was covered only by polybag wit the lens outside the bag (stupid - i know). I'm not relly sure but I think I remember there were errors. I can't check this now since I have deleted all the photos. However, even if there were errors I'm sure there was not as many of them as usually when I shoot from the room, when at night I even 1 error per a few photos...

Who'd like to test the script in a frozer and then in a owen? :)
if (2*b || !2*b) {
    cout<<question
}

Compile error: poor Yorick


*

Offline ahull

  • *****
  • 634
Re: help: how to debug script interrupting?
« Reply #188 on: 05 / May / 2013, 17:08:13 »
Who'd like to test the script in a frozer and then in a owen? :)

Not perhaps such a daft idea, however before I bake and freeze my Ixus (sounds pretty painful to me), could we not log the various temperatures (bat and CCD, (are there any others?)) as we test, to see if this is a possibility. At the very least such a test would give us something else we can eliminate, if they stay fairly constant, but we still see the error.

Outslider, bear in mind that the camera body probably provides quite good insulation from the outside ambient temp, so starting the camera up after it has been in a nice warm jacket pocket when you are out in the frozen wilds probably means it has a fairly high starting temp, and it may then be the case that it looses heat to its surrounding less quickly than it builds up from the electronics.
« Last Edit: 05 / May / 2013, 17:10:07 by ahull »

*

Offline reyalp

  • ******
  • 13619
Re: help: how to debug script interrupting?
« Reply #189 on: 05 / May / 2013, 17:26:23 »
Not perhaps such a daft idea, however before I bake and freeze my Ixus (sounds pretty painful to me), could we not log the various temperatures (bat and CCD, (are there any others?)) as we test, to see if this is a possibility.
The version of uint.lua I posted earlier logs the CCD temp. The other options are battery (not relevant to me since I'm using an external PSU, and I think the lipo temp sensors are actually on the battery, though I'm not certain), and optical.

I would take CCD temp as the closest proxy for CPU temp. I've attached a version that also logs the optical. While I could log these in the dump, it's easy to backtrack to within a couple of seconds in the log. Would help with the msgtest runs though, since there's no other log.

Attached version also includes optical.

In my last runs I think all the crashes happened around 50c, but most of the run was at that temp. I'll have a closer look through those logs in a bit.
Quote
so starting the camera up after it has been in a nice warm jacket pocket when you are out in the frozen wilds probably means it has a fairly high starting temp, and it may then be the case that it looses heat to its surrounding less quickly than it builds up from the electronics.
The camera also generates quite a lot of heat when shooting continuously, possible more than the designers expected. It strikes me that low end cameras like the SX130 are likely to have less careful thermal design. The D10 is also probably relatively poor, since it's in a heavy sealed case. Big hunks like the G series would probably do better.

but... that really doesn't explain why it's so consistent hitting this one location. I have seen bad hardware and bad overclocks behave somewhat similarly though, so it may not be completely nuts.

edit:
Here's some data from my previous runs
First shoot runs, wasn't logging temp. I noticed that the optical temp on the OSD was quite high though (I think 39c), which prompted me to add temp logging in the later versions. Ref this post http://chdk.setepontos.com/index.php?topic=8273.msg99988#msg99988 also the end of this run was late at night, when temps were lower and there were no errors in the last 3600 shots.

First noshoot run ref http://chdk.setepontos.com/index.php?topic=8273.msg100092#msg100092 here's the table with CCD temp added
err   elapsed timedtshotdsCCD
start00:00:00   00:00:000   044
1   00:34:01   00:34:011404140449
2   00:37:20   00:03:19157417050
3   00:47:29   00:10:09199442050
4   00:51:39   00:04:10218819450
5   00:55:03   00:03:24233314550
6   01:16:50   00:21:47324290950
7   01:26:35   00:09:45364740550
8   01:55:47   00:29:124858121150
end   01:57:48   00:02:0149428450
So first error was at 49c and it was steady at 50 after that. It reached 49c on image 1000, and 50c on image 1703

On the autobuild, noshoot run that didn't trigger the bug:
reached 50c at photo 3231, 51c at 5907

so no strong correlation there, although at a lower ambient temp I guess CPU might be a little cooler relative to CCD.

I don't seem to have saved the log from the run I did with a stock trunk build from my source tree.

For the run with shooting that trigger the bug again:
CCD hit 50 on photo 256, 51 on 457, first error at shot 318

The msgtest runs were immediately after that, while the CCD was ~49c. It hit 4 times in 100k msgtest.

Looking at the above, I'd say this is inconclusive at best. I think some of the msgtest runs I did earlier where it triggered were not at super high temps.

edit:
updated script for better formatting in log

« Last Edit: 05 / May / 2013, 19:53:32 by reyalp »
Don't forget what the H stands for.

 

Related Topics