help: how to debug script interrupting? - page 13 - General Discussion and Assistance - CHDK Forum

help: how to debug script interrupting?

  • 224 Replies
  • 71303 Views
Re: help: how to debug script interrupting?
« Reply #120 on: 25 / April / 2013, 23:22:44 »
Advertisements
You *can* run out of memory due to uncollected garbage. This does not indicate the garbage collector is malfunctioning, since it has no idea how much memory is available
The amount of memory available varies from camera to camera.  As does the "Lua crashing bug".  Might not be related but lacking a better explanation its worth pursuing?
Ported :   A1200    SD940   G10    Powershot N    G16

*

Offline philmoz

  • *****
  • 3450
    • Photos
Re: help: how to debug script interrupting?
« Reply #121 on: 26 / April / 2013, 00:32:52 »
How have you determined that it isn't "doing garbage collection properly"?

The count and threshold values continue to rise and only occasionally will the go down.
When they do go down they don't go down as far as you would expect - i.e. not all garbage gets collected. They then start rising again.

If I add collectgarbage() in the loop then the count and threshold stay pretty constant.

I'm not sure this has anything to do with the problems being seen; but it can't hurt to try it.

In addition to running out of memory there is also a possible fragmentation issue - many of the Lua allocations are small (20-30 bytes) and there are thousands of them happening.

Phil.
CHDK ports:
  sx30is (1.00c, 1.00h, 1.00l, 1.00n & 1.00p)
  g12 (1.00c, 1.00e, 1.00f & 1.00g)
  sx130is (1.01d & 1.01f)
  ixus310hs (1.00a & 1.01a)
  sx40hs (1.00d, 1.00g & 1.00i)
  g1x (1.00e, 1.00f & 1.00g)
  g5x (1.00c, 1.01a, 1.01b)
  g7x2 (1.01a, 1.01b, 1.10b)

*

Offline reyalp

  • ******
  • 14138
Re: help: how to debug script interrupting?
« Reply #122 on: 26 / April / 2013, 01:50:17 »
The amount of memory available varies from camera to camera.  As does the "Lua crashing bug".  Might not be related but lacking a better explanation its worth pursuing?
1) We have multiple dumps of memory info when the problem occurred. All show *plenty* of free memory. Example from lapsers last dump
Code: [Select]
      start_address     1933784 0x001d81d8
        end_address     5624400 0x0055d250
         total_size     3690616 0x00385078
     allocated_size     1472608 0x00167860
     allocated_peak     1560928 0x0017d160
    allocated_count        1910 0x00000776
          free_size     2218008 0x0021d818
free_block_max_size     2130760 0x00208348
   free_block_count         102 0x00000066
2) The expected (and observed) symptoms of running out of memory are different from the Lua "crash bug"
3) There is no obvious mechanism for running out of memory to cause the "crash bug" symptom, and as I explained before, if it were somehow triggered by running out of memory, it is highly improbably that other symptoms like firmware asserts would not be triggered.

So yeah, I don't have a good explanation for the crash bug, but I doubt very much that running out of memory is a factor.
« Last Edit: 26 / April / 2013, 01:58:17 by reyalp »
Don't forget what the H stands for.

*

Offline reyalp

  • ******
  • 14138
Re: help: how to debug script interrupting?
« Reply #123 on: 26 / April / 2013, 01:56:50 »
The count and threshold values continue to rise and only occasionally will the go down.
When they do go down they don't go down as far as you would expect - i.e. not all garbage gets collected. They then start rising again.
The Lua garbage collector does not run a full collection under normal operation, it runs a "step". You can tune how much is collected in a step. See http://www.lua.org/manual/5.1/manual.html#2.10
Quote
If I add collectgarbage() in the loop then the count and threshold stay pretty constant.
That's what you'd expect from a full collection.

Quote
In addition to running out of memory there is also a possible fragmentation issue - many of the Lua allocations are small (20-30 bytes) and there are thousands of them happening.
I'm not seeing how fragmentation would matter, it should be completely transparent to Lua. If this were the cause, then you'd expect the complexity of the Lua code to be a primary factor.

edit:
Like I said, I don't have any good ideas, so sure, throwing a full collection in there may change something that gives us a clue. I do think that straight running out of memory is pretty well eliminated though...
« Last Edit: 26 / April / 2013, 02:03:24 by reyalp »
Don't forget what the H stands for.

Re: help: how to debug script interrupting?
« Reply #124 on: 26 / April / 2013, 02:53:39 »
Hello!

Afterr adding collectgarbage() function the script was able to take 400+ photos this night without an interruption. So whatever is happening it is related to this. I'll keep testing this tonight, I'll remove collectgarbage() and add logging collectgarbage("count") to check what is the amount of memory taken by Lua.

Thank you!
if (2*b || !2*b) {
    cout<<question
}

Compile error: poor Yorick

*

Offline philmoz

  • *****
  • 3450
    • Photos
Re: help: how to debug script interrupting?
« Reply #125 on: 26 / April / 2013, 03:58:13 »
I'm not seeing how fragmentation would matter, it should be completely transparent to Lua. If this were the cause, then you'd expect the complexity of the Lua code to be a primary factor.

But it might matter to CHDK or the Canon firmware - significant fragmentation could prevent allocation of a large block of memory even though there is sufficient total free.

I'm partly clutching at straws here too - the symptoms being seen just don't make any sense and there doesn't seem to be a rational explanation.

Phil.
CHDK ports:
  sx30is (1.00c, 1.00h, 1.00l, 1.00n & 1.00p)
  g12 (1.00c, 1.00e, 1.00f & 1.00g)
  sx130is (1.01d & 1.01f)
  ixus310hs (1.00a & 1.01a)
  sx40hs (1.00d, 1.00g & 1.00i)
  g1x (1.00e, 1.00f & 1.00g)
  g5x (1.00c, 1.01a, 1.01b)
  g7x2 (1.01a, 1.01b, 1.10b)

*

Offline reyalp

  • ******
  • 14138
Re: help: how to debug script interrupting?
« Reply #126 on: 27 / April / 2013, 01:08:24 »
Afterr adding collectgarbage() function the script was able to take 400+ photos this night without an interruption. So whatever is happening it is related to this. I'll keep testing this tonight, I'll remove collectgarbage() and add logging collectgarbage("count") to check what is the amount of memory taken by Lua.
This is interesting.
Without using collectgarbage, how long did it typically take to get the error?
edit: Ok reading back you said 10-30 shots. That is much shorter than the other incidents of "script death" we've seen, so either this is something different, or you have found a good way to trigger it. I'm trying the script on my D10...

Which firmware version of sx130 do you have, and what chdk build are you running on it?

I can make you a test build with patch mentioned here http://chdk.setepontos.com/index.php?topic=8997.msg99162#msg99162 which should tell us definitively if you are having the same problem lapser has been having.

It's also interesting to note that Alarik who started this thread was using sx130 as well


edit:
D10 ran for > 500 shots without any problems (without a collectgarbage). I'll try a longer run tomorrow.
« Last Edit: 27 / April / 2013, 03:45:22 by reyalp »
Don't forget what the H stands for.

*

Offline reyalp

  • ******
  • 14138
Re: help: how to debug script interrupting?
« Reply #127 on: 27 / April / 2013, 15:18:01 »
Good (?) news, I've reproduced this on D10 after approx 700 shots using outsliders script.

I also made the following change in print_data() to observe memory use (I forgot to put it in the log)
Code: [Select]
-   print()
+  print("Mem:",collectgarbage('count'),get_meminfo().free_size)
Watching this, it seemed like Lua use bounced around between 50 and 80 kb, so garbage collection appears to be functioning.

edit:
From the dump, free memory was 397992.  Canon meminfo (not using exmem)
Code: [Select]
         total_size     1608128 0x001889c0
     allocated_size     1195352 0x00123d58
     allocated_peak     1271680 0x00136780
    allocated_count        1742 0x000006ce
          free_size      397992 0x000612a8
free_block_max_size      365536 0x000593e0
   free_block_count         107 0x0000006b
total_size - allocated_peak = ~300k, so never got really close to running out.
allocated_count is 1366 on startup (with all the images on the card) so 1742 isn't particularly extreme.

From the script log, it died on shot 620, the last line in the log was  - got shooting:    4771440
FWIW, last nights run went longer than 620.
« Last Edit: 27 / April / 2013, 16:10:37 by reyalp »
Don't forget what the H stands for.

Re: help: how to debug script interrupting?
« Reply #128 on: 27 / April / 2013, 17:04:53 »
I tried last night and today and could not repeat the bug. I previously deleted all images from card, so maybe there is a connection between number of photos on card and this bug? (are some informations about images are stored in RAM?)

I have taken a lot of picture today and now I'm running the intervalometer to check whether there will be a bug.

I have previously said in this thread that the lapser's dummy script that performs only an empty loop also caused a lua crash on my sx130.

I have firmware 1.01c and CHDK #2724 from trunk.
if (2*b || !2*b) {
    cout<<question
}

Compile error: poor Yorick

*

Offline reyalp

  • ******
  • 14138
Re: help: how to debug script interrupting?
« Reply #129 on: 27 / April / 2013, 17:36:55 »
I tried last night and today and could not repeat the bug. I previously deleted all images from card, so maybe there is a connection between number of photos on card and this bug? (are some informations about images are stored in RAM?)
Yes each more images on the card uses a bit of RAM, but as I've said before all the dumps I've seen show plenty free at the time the bug was triggered.

You can add get_meminfo().free_size to your log to record how much is available.

I've attached a debug dump build for sx130 101c based on trunk 2732. Note that this will create dump files for *any* Lua error, so you might not want to run it for normal use.

If you hit the bug, it should create LUALOG.DMP which will have some information about the camera state when the bug happened. If you hit a different error, some other .DMP files will be generated too. If you are using this, you should also turn on "enable logging" in the miscellaneous->modules menu, and include the module log with the DMP file.

I've also attached my current debug patch for anyone who is interested.

I should really update the debug dump code to include the module list...

edit:
It's not urgent that you run this, assuming it's the same bug it probably won't give me more info than I already have here. It would be useful to know whether it is actually the same though.
« Last Edit: 27 / April / 2013, 17:43:24 by reyalp »
Don't forget what the H stands for.

 

Related Topics


SimplePortal © 2008-2014, SimplePortal