Lua performance testing

  • 5 Replies
  • 1227 Views
*

Online reyalp

  • ******
  • 11082
Lua performance testing
« on: 11 / July / 2015, 16:18:06 »
Advertisements
Posting the results of some performance testing I did for future reference.

Note that most scripts don't need to worry about this. Unless your code is time critical and extremely complex and/or doing many thousands of iterations, you should just do whatever makes the code easy to work with. None of this is at all significant if you aren't using set_yield.

The following was done using D10.

Table, globals
Code: [Select]
=t={1} set_yield(-1,-1) t0=get_tick_count() for i=1,100000 do j=t[1] end return get_tick_count()-t0
800

Table, locals
Code: [Select]
=local t={1} local j set_yield(-1,-1) t0=get_tick_count() for i=1,100000 do j=t[1] end return get_tick_count()-t0
310

Function, globals
Code: [Select]
=set_yield(-1,-1) t0=get_tick_count() for i=1,100000 do j=bitnot(1) end return get_tick_count()-t0
1090
Function, locals
Code: [Select]
con 27> =local bn=bitnot local j set_yield(-1,-1) t0=get_tick_count() for i=1,100000 do j=bn(1) end return get_tick_count()-t0
780

So, a simple C function call costs more than 2x as much as an integer table lookup (keeping in mind the loop itself has significant overhead). Globals have substantial overhead, which is a well known feature of Lua caused by the fact that globals are themselves stored in a table.

All of these are still very fast, even the slowest case is ~11 microseconds per iteration. So for the vast majority of scripts, this is not important.

I verified separately that Lua isn't smart enough to optimize away the unused assignment to j, making it a sum and returning it had relatively little impact on the results. I also verified that it isn't smart enough to optimize away the t[1], by initializing a table and making it use t[ i ] instead.

bitnot was chosen as a function that involves minimal C code. Substituting get_zoom_steps, which takes no parameters and just returns a global C had a modest impact on the result (~160ms in the locals case)

Multiple runs of the tests showed essentially no variation.

Camera comparison:
A540 (digic II) local table
Code: [Select]
=local t={1} local j set_yield(-1,-1) t0=get_tick_count() for i=1,100000 do j=t[1] end return get_tick_count()-t0
620

ixus140 (digic IV) local table
Code: [Select]
=local t={1} local j set_yield(-1,-1) t0=get_tick_count() for i=1,100000 do j=t[1] end return get_tick_count()-t0
310

All of the above were done in playback mode.
D10 local table, idling in rec mod
Code: [Select]
=local t={1} local j set_yield(-1,-1) t0=get_tick_count() for i=1,100000 do j=t[1] end return get_tick_count()-t0
370
« Last Edit: 11 / July / 2015, 16:21:48 by reyalp »
Don't forget what the H stands for.

*

Online reyalp

  • ******
  • 11082
Re: Lua performance testing
« Reply #1 on: 11 / July / 2015, 23:28:09 »
Some more:
A bare loop with no body
Code: [Select]
=set_yield(-1,-1) t0=get_tick_count() for i=0,100000 do end return get_tick_count() - t0
120

A loop with an empty lua function.
Code: [Select]
=local f=function() end set_yield(-1,-1) t0=get_tick_count() for i=0,100000 do f() end return get_tick_count() -  t0
670
Interestingly, an empty C function takes only 480.

Math performance in C.
We have often avoided FP math because it's entirely in software, but I've never really had a feeling for how bad it may be. Integer division is also in software.

For this test, I made lua functions that add, multiply or divide two C globals and store the result in another global.

This is not really a good test, since it's mostly Lua<->C overhead, and load/store overhead is likely to be significant for integers too. To measure this, I also included and empty function and one that just does an assignment.

As one would expect, the results (especially for division an multiplication) depend on the actual values: Multiplying or dividing by 1 is fast.

Test code (after setting values to be operated on with test_seti and test_setf), using patch attached.
Code: [Select]
r={}
funcs={'nop','fdiv','fmul','fadd','fmov','idiv','imul','iadd','imov'}
set_yield(-1,-1)
for _,n in ipairs(funcs) do
 local f=_G['test_'..n]
 t0=get_tick_count()
 for i=0,100000 do
  f()
 end
 r[n] = get_tick_count()-t0
 sleep(50)
end
return r

Values used in the operation a given below as i1,i2,f1 and f2.
Results

i1 = i2 = f1 = f2 = 1
Code: [Select]
nop=480
fmov=480
fmul=520
fdiv=510
fadd=520
imov=480
iadd=480
imul=480
idiv=500
From this we can see that overhead is ~480, and load / store is insignificant. nop and mov are omitted from the following.

i1=f1=60000
i2=f2=123
Code: [Select]
fadd=540
fmul=540
fdiv=880
iadd=480
imul=480
idiv=560

i1=f1=60000
i2=f2=12345
Code: [Select]
fadd=550
fmul=540
fdiv=880
iadd=480
imul=480
idiv=520

i1=f1=1700000
i2=f2=3
Code: [Select]
fadd=540
fdiv=880
fmul=540
iadd=480
imul=480
idiv=610

Subtracting the overhead of 480, the fdiv above works out to 4 microseconds or 250k/sec. Integer division is faster.

Bottom line: FP math or integer division can have significant impact if thousands of operations are involved, but it's not so slow it needs to be avoided at all costs.

Equivalent of the last one in Lua
Code: [Select]
=local a=1700000 local b=3 local z set_yield(-1,-1) t0=get_tick_count() for i=0,100000 do z=a+b end return get_tick_count() - t0
240

=local a=1700000 local b=3 local z set_yield(-1,-1) t0=get_tick_count() for i=0,100000 do z=a*b end return get_tick_count() - t0
240

=local a=1700000 local b=3 local z set_yield(-1,-1) t0=get_tick_count() for i=0,100000 do z=a/b end return get_tick_count() - t0
360
Given that ~120 is loop overhead, this is not bad.
Don't forget what the H stands for.

Re: Lua performance testing
« Reply #2 on: 11 / July / 2015, 23:36:02 »
tl;dr:  FP math or integer division can have significant impact if thousands of operations are involved, but it's not so slow it needs to be avoided at all costs.
Ported :   A1200    SD940   G10    Powershot N    G16

*

Online reyalp

  • ******
  • 11082
Re: Lua performance testing
« Reply #3 on: 07 / August / 2016, 19:18:45 »
G7X (digic 6) Lua results

tl;dr
For general Lua code, Digic 6 is 2x-3x faster than Digic 4

Table, globals
Code: [Select]
=t={1} set_yield(-1,-1) t0=get_tick_count() for i=1,100000 do j=t[1] end return get_tick_count()-t0
1:return:360
Table, locals
Code: [Select]
=local t={1} local j set_yield(-1,-1) t0=get_tick_count() for i=1,100000 do j=t[1] end return get_tick_count()-t0
2:return:130
Function, globals
Code: [Select]
=set_yield(-1,-1) t0=get_tick_count() for i=1,100000 do j=bitnot(1) end return get_tick_count()-t0
3:return:520

Function, locals
Code: [Select]
=local bn=bitnot local j set_yield(-1,-1) t0=get_tick_count() for i=1,100000 do j=bn(1) end return get_tick_count()-t0
4:return:390

Empty loop
Code: [Select]
=set_yield(-1,-1) t0=get_tick_count() for i=0,100000 do end return get_tick_count() - t0
5:return:40

Empty function
Code: [Select]
=local f=function() end set_yield(-1,-1) t0=get_tick_count() for i=0,100000 do f() end return get_tick_count() -  t0
6:return:300

Addition
Code: [Select]
=local a=1700000 local b=3 local z set_yield(-1,-1) t0=get_tick_count() for i=0,100000 do z=a+b end return get_tick_count() - t0
7:return:100

multiplication
Code: [Select]
=local a=1700000 local b=3 local z set_yield(-1,-1) t0=get_tick_count() for i=0,100000 do z=a*b end return get_tick_count() - t0
9:return:100

Division
Code: [Select]
=local a=1700000 local b=3 local z set_yield(-1,-1) t0=get_tick_count() for i=0,100000 do z=a/b end return get_tick_count() - t0
10:return:110
Don't forget what the H stands for.


Re: Lua performance testing
« Reply #4 on: 07 / August / 2016, 20:18:26 »
Impressive.
Ported :   A1200    SD940   G10    Powershot N    G16

Re: Lua performance testing
« Reply #5 on: 08 / August / 2016, 04:24:00 »
Code: [Select]
         
 op    A540(2)    ixus95(4)    ixus140(4)    D10(4)    sx50(5)    G7X(6)
Table, Globals        580        800    500    360
Table, locals    620    220    310    310    190    130
Function, globals        770        1090    680    520
Function, locals        540        780    470    390
Empty loop        70        120    60    40
Empty function        450        670    390    300
Addition        150        240    130    100
Multiplication        150        240    130    100
Division        270        360    240    110



The times are reported in milliseconds for 100,000 loops, in other words, a value of 100 here is equivalent to each command taking 10uSec.

The benchmarks
These benchmarks are written in a form that you can copy and paste to a chdkptp command prompt.
The "=" prefix is a command to chdkptp to run the following as a lua command and return the output.
For more information see
https://app.assembla.com/spaces/chdkptp/wiki/CLI_Quickstart
Generally, after installing the libusb driver with zadig, start chdkptp, click Connect then paste the benchmarks below and click execute (or press return).

Table, Globals
Code: [Select]
=t={1} set_yield(-1,-1) t0=get_tick_count() for i=1,100000 do j=t[1] end return get_tick_count()-t0
Table, locals
Code: [Select]
=local t={1} local j set_yield(-1,-1) t0=get_tick_count() for i=1,100000 do j=t[1] end return get_tick_count()-t0
Function, globals
Code: [Select]
=set_yield(-1,-1) t0=get_tick_count() for i=1,100000 do j=bitnot(1) end return get_tick_count()-t0
Function, locals
Code: [Select]
=local bn=bitnot local j set_yield(-1,-1) t0=get_tick_count() for i=1,100000 do j=bn(1) end return get_tick_count()-t0
Empty loop
Code: [Select]
=set_yield(-1,-1) t0=get_tick_count() for i=0,100000 do end return get_tick_count() - t0
Empty function
Code: [Select]
=local f=function() end set_yield(-1,-1) t0=get_tick_count() for i=0,100000 do f() end return get_tick_count() -  t0
Addition
Code: [Select]
=local a=1700000 local b=3 local z set_yield(-1,-1) t0=get_tick_count() for i=0,100000 do z=a+b end return get_tick_count() - t0
Multiplication
Code: [Select]
=local a=1700000 local b=3 local z set_yield(-1,-1) t0=get_tick_count() for i=0,100000 do z=a*b end return get_tick_count() - t0
Division
Code: [Select]
=local a=1700000 local b=3 local z set_yield(-1,-1) t0=get_tick_count() for i=0,100000 do z=a/b end return get_tick_count() - t0
The Cameras Tested
A540(2)
PowerShot A540
Digic II
http://chdk.wikia.com/wiki/A540

ixus95(4)
Powershot SD1200IS / IXUS95IS
Digic IV
http://chdk.wikia.com/wiki/SD1200

ixus140(4)
Powershot ELPH 130 IS / IXUS 140
Digic IV
http://chdk.wikia.com/wiki/ELPH130


D10(4)
Canon Powershot D10
Digic IV
http://chdk.wikia.com/wiki/D10

sx50(5)
Powershot SX50HS
Digic V
http://chdk.wikia.com/wiki/SX50HS

G7X(6)
Powershot G7X
Digic VI

Observations
The IXUS95 is about 40% faster than other Digic IV tested.
The samples Digic 6 camera compared to the Digic 5 camera is 1.21 to 2.18 times faster, and division is vastly improved.
The SX50 compared to the IXUS95, is 1.13 to 1.17 times faster. It seems there's not much speed improvement from Digic IV to Digic V.
« Last Edit: 08 / August / 2016, 04:59:57 by jmac698 »

 

Related Topics