Euphoria Performance Tips
In any programming language, and especially in Euphoria, you really have to make measurements before drawing conclusions about performance. Euphoria provides both execution-count profiling, as well as time profiling (DOS32 only). See refman.doc. You will often be surprised by the results of these profiles. Concentrate your efforts on the places in your program that are using a high percentage of the total time (or at least are executed a large number of times.) There's no point to rewriting a section of code that uses 0.01% of the total time. Usually there will be one place, or just a few places where code tweaking will make a significant difference. You can also measure the speed of code by using the time() function. e.g. atom t t = time() for i = 1 to 10000 do -- small chunk of code here end for ? time() - t You might rewrite the small chunk of code in different ways to see which way is faster.
Profiling will show you the hot spots in your program. These are usually inside loops. Look at each calculation inside the loop and ask yourself if it really needs to happen every time through the loop, or could it be done just once, prior to the loop.
Addition is faster than multiplication. Sometimes you can replace a multiplication by the loop variable, with an addition. Something like: for i = 0 to 199 do poke(screen_memory+i*320, 0) end forbecomes: x = screen_memory for i = 0 to 199 do poke(x, 0) x = x + 320 end for Saving Results in Variables
If you have a routine that is rather small and fast, but is called a huge number of times, you will save time by doing the operation in-line, rather than calling the routine. Your code may become less readable, so it might be better to in-line only at places that generate a lot of calls to the routine.
Euphoria lets you operate on a large sequence of data using a single statement. This saves you from writing a loop where you process one element at-a-time. e.g. x = {1,3,5,7,9} y = {2,4,6,8,10} z = x + yversus: z = repeat(0, 5) -- if necessary for i = 1 to 5 do z[i] = x[i] + y[i] end for In most interpreted languages, it is much faster to process a whole sequence (array) in one statement, than it is to perform scalar operations in a loop. This is because the interpreter has a large amount of overhead for each statement it executes. Euphoria is different. Euphoria is very lean, with little interpretive overhead, so operations on sequences don't always win. The only solution is to time it both ways. The per-element cost is usually lower when you process a sequence in one statement, but there are overheads associated with allocation and deallocation of sequences that may tip the scale the other way.
Euphoria automatically optimizes certain special cases. x and y below could be variables or arbitrary expressions. x + 1 -- faster than general x + y 1 + x -- faster than general y + x x * 2 -- faster than general x * y 2 * x -- faster than general y * x x / 2 -- faster than general x / y floor(x/y) -- where x and y are integers, is faster than x/y floor(x/2) -- faster than floor(x/y) x below is a simple variable, y is any variable or expression: x = append(x, y) -- faster than general z = append(x, y) x = prepend(x, y) -- faster than general z = prepend(x, y) x = x & y -- where x is much larger than y, -- is faster than general z = x & yWhen you write a loop that "grows" a sequence, by appending or concatenating data onto it, the time will, in general, grow in proportion to the square of the number (N) of elements you are adding. However, if you can use one of the special optimized forms of append(), prepend() or concatenation listed above, the time will grow in proportion to just N (roughly). This could save you a huge amount of time when creating an extremely long sequence. (You could also use repeat() to establish the maximum size of the sequence, and then fill in the elements in a loop, as discussed below.)
For greater speed, convert: left-hand-side = left-hand-side op expressionto: left-hand-side op= expressionwhenever left-hand-side contains at least 2 subscripts, or at least one subscript and a slice. In all simpler cases the two forms run at the same speed (or very close to the same).
Writing text to the screen using puts() or printf() is rather slow. If necessary, in DOS32, you can do it much faster by poking into the video memory, or by using display_text_image(). There is a very large overhead on each puts() to the screen, and a relatively small incremental cost per character. The overhead with exw (WIN32) is especially high (on Windows 95/98/ME at least). Linux and FreeBSD are somewhere between DOS32 and WIN32 in text output speed. It therefore makes sense to build up a long string before calling puts(), rather than calling it for each character. There is no advantage to building up a string longer than one line however. The slowness of text output is mainly due to operating system overhead.
Some common routines are extremely fast. You probably couldn't do the job faster any other way, even if you used C or assembly language. Some of these are:
Other routines are reasonably fast, but you might be able to do the job faster in some cases if speed was crucial. x = repeat(0,100) for i = 1 to 100 do x[i] = i end foris somewhat faster than: x = {} for i = 1 to 100 do x = append(x, i) end forbecause append() has to allocate and reallocate space as x grows in size. With repeat(), the space for x is allocated once at the beginning. (append() is smart enough not to allocate space with every append to x. It will allocate somewhat more than it needs, to reduce the number of reallocations.) You can replace: remainder(x, p)with: and_bits(x, p-1)for greater speed when p is a positive power of 2. x must be a non-negative integer that fits in 32-bits. arctan() is faster than arccos() or arcsin().
Euphoria's find() is the fastest way to search for a value in a sequence up to about 50 elements. Beyond that, you might consider a hash table (demo\hash.ex) or a binary tree (demo\tree.ex).
In most cases you can just use the shell sort routine in include\sort.e. If you have a huge amount of data to sort, you might try one of the sorts in demo\allsorts.e (e.g. great sort). If your data is too big to fit in memory, don't rely on Euphoria's automatic memory swapping capability. Instead, sort a few thousand records at a time, and write them out to a series of temporary files. Then merge all the sorted temporary files into one big sorted file. If your data consists of integers only, and they are all in a fairly narrow range, try the bucket sort in demo\allsorts.e.
As CPU speeds increase, the gap between the speed of the on-chip cache memory and the speed of the main memory or DRAM (dynamic random access memory) becomes ever greater. You might have 256 Mb of DRAM on your computer, but the on-chip cache is likely to be only 8K (data) plus 8K (instructions) on a Pentium, or 16K (data) plus 16K (instructions) on a Pentium with MMX or a Pentium II/III. Most machines will also have a "level-2" cache of 256K or 512K. An algorithm that steps through a long sequence of a couple of thousand elements or more, many times, from beginning to end, performing one small operation on each element, will not make good use of the on-chip data cache. It might be better to go through once, applying several operations to each element, before moving on to the next element. The same argument holds when your program starts swapping, and the least-recently-used data is moved out to disk. These cache effects aren't as noticeable in Euphoria as they are in lower-level compiled languages, but they are measurable.
Euphoria lets you call routines written in 32-bit Intel machine code. On WIN32 and Linux and FreeBSD you can call C routines in .dll or .so files, and these C routines can call your Euphoria routines. You might need to call C or machine code because of something that can't be done directly in Euphoria, or you might do it for improved speed. To boost speed, the machine code or C routine needs to do a significant amount of work on each call, otherwise the overhead of setting up the arguments and making the call will dominate the time, and it might not gain you much. Many programs have some inner core operation that consumes most of the CPU time. If you can code this in C or machine code, while leaving the bulk of the program in Euphoria, you might achieve a speed comparable to C, without sacrificing Euphoria's safety and flexibility.
In version 3.0, the full Euphoria To C Translator is included in the installation package. It will translate any Euphoria program into a set of C source files that you can compile using a C compiler. The executable file that you get using the Translator should run the same, but faster than when you use the interpreter. The speed-up can be anywhere from a few percent to a factor of 5 or more.
|