This code contain proposed by you changes.
It consist of three different equal by size and infulence(i think) pack of changes:
- Rearranging code in parser. to decrease number of unneeded checks and constant-time most of simple operations detection. this increase parsing speed 4-6 times.
- Runtime compilation. If enabled by #define, after token detection tokenizer check is size in the text enough and override this token right in text with three bytes \0x01-size-token_id. So next time when tokenizer will read this statement, it shouldn't search again what is this token and which its size. Even for rem.
Minor but still safe addition - If number value was asked I do write it in same way to avoid atoi.
This increase second and later passes speed up to 30-40 times faster
- Controlflow operator processing.
For cycles and return I go in same way as proposed by you. I store offset, but also store linenum because it should be tracked to correct display error place if any. I compose this into just integer (16bit+16bit) and call this "mark".
For goto and gosub - we have enough place right in script (size of "goto"+space+label) and already have infrastructure to write there (runtime compilation). So I write mark right after 0x01-size-token.
Advantages of used solution are:
- Ubasic become as fast as LUA
- We absolutely do nothing in heap. Ubasic still tiny with zero memory footprint
- Even for tokenizer almost nothing was changed. It goes in same way in same text. But check using quick tips and new kind of tokens (precompiled) appear.
Disadvantage is:
- Code is just different. I think it is little bit more complicate (yes, because we add some other layers and "classes"), but complexity do not increase a lot.
Personally I as senior software teamleader understand why you are very careful with unclear to you changes. I'm do same things on my job.
That's why I do not propose this changes before they tested by me in many ways.
Stability:
Most changes (everything except control flow) are isolated inside tokenizer. Main ubasic part have no idea that something was changed.
So how I did test it:
a) I use unit test which call tokenizer for direct, reverse and random tokens. Its pass
b) I write reverter which log answers of tokenizer (convert back from token_id to token_text) and compare results of first and next passes with base scripts. Most existed in repository and in forum scripts was used.
c) I make dump of many complex scripts after precompilation and check that everything is clear
d) I use same reverter to log control flow operations and write test scripts with many for/if/while/goto/gosub etc. It works exactly as it expected
e) I just run different scripts and check are they worked ok
After many iterations this changes are stable.