Euphoria To C Translator
1. Introduction The Euphoria to C Translator will translate any Euphoria program into equivalent C source code. There are versions of the translator for Windows, DOS, Linux and FreeBSD. After translating a Euphoria program to C, you can compile and link using one of the supported C compilers. This will give you an executable file that will typically run much faster than if you used the Euphoria interpreter. The translator can translate/compile *itself* into an executable file for each platform. The translator is also used in translating/compiling the front-end portion of the interpreter. The source code for the Translator is in euphoria\source. It's written 100% in Euphoria. The Translator currently works with GNU C on Linux or FreeBSD, with either Watcom C or DJGPP C on DOS, and with either Watcom C, Lcc or Borland 5.5 on Windows. These are all free compilers. GNU C will exist already on your Linux or FreeBSD system. The others can be downloaded from their respective Web sites. For Windows, we strongly recommend Watcom or Borland over Lcc. Lcc is still actively being developed and is getting better, but has some bugs that will make it difficult for you to compile a large Windows program correctly. Watcom and Borland are both rock solid. Watcom usually produces slightly smaller, slightly faster executables, but Borland compiles much faster. The Translator has been tested with GNU C and the ncurses library available with Red Hat Linux 5.2 or later, and FreeBSD 4.5 or later. It has been tested with Watcom C/C++ 9.5, 10.6 and 11.0. Watcom is open source and free. The Watcom DOS32 package includes the CauseWay DOS extender and file compressor. CauseWay is now open source and free. You can find out more about it at: http://www.devoresoftware.com emake.bat and objfiles.lnk will link in the CauseWay extender automatically. Other DOS extenders, such as DOS4GW, do not work well with the Translator. The Translator looks for "WATCOM", "LCC", "BORLAND" or "DJGPP" as either environment variables or directories on your PATH. It will generate an emake.bat file that invokes the appropriate compiler and linker. Notes:
Running the Translator is similar to running the Interpreter. On DOS you would type: ec allsorts.ex or ec allsortsbut instead of running the allsorts.ex program, the Translator will create several C source files. Anyone can run the Translator. It's included in euphoria\bin along with the interpreter. To compile and link the C files, you need to install one of the supported C compilers. The Translator creates a batch file called emake.bat that does all the compiling and linking steps for you, so you don't actually have to know anything about C or C compilers. Just type: emakeWhen the C compiling and linking is finished, you will have a file called: allsorts.exe and the C source files will have been removed to avoid clutter.
When you run allsorts.exe, it should run the same as if you had typed:
ex allsorts
After creating your executable file, emake removes all the C files that were created. If you want to look at these files, run the translator again and look at the files before running emake. Note to Linux and FreeBSD users:
Note to Borland and Lcc users:
Command-Line Options If you happen to have more than one C compiler for a given platform, you can select the one you want to use with a command-line option:
ecw -bor pretend.exw Normally, after building your .exe file, the emake batch file will delete all C files and object files produced by the Translator. If you want emake to keep these files, add the -keep option to the Translator command-line. e.g. ec -wat -keep sanity.ex To make a Windows .dll file, or Linux or FreeBSD .so file, just add -dll to the command line. e.g. ecw -bor -dll mylib.ewTo make a Windows console program instead of a Windows GUI program, add -con to the command line. e.g. ecw -bor -con myprog.exwTo increase or decrease the total amount of stack space reserved for your program, add -stack nnnn to the command line. e.g. ec -stack 100000 myprog.exThe total stack space (in bytes) that you specify will be divided up among all the tasks that you have running (assuming you have more than one). Each task has it's own private stack space. If it exceeds its allotment, you'll get a run-time error message identifying the task and giving the size of its stack space. Most non-recursive tasks can run with call stacks as small as 2000 bytes, but to be safe, you should allow more than this. A deeply-recursive task could use a great deal of space. It all depends on the maximum levels of calls that a task might need. At run-time, as your program creates more simultaneously-active tasks, the stack space allotted to each task will tend to decrease. To make a DOS program, compiled by WATCOM, that uses fast hardware floating-point instructions, add -fastfp to the command line. e.g. ec -wat -fastfp crunch.exBy default, Euphoria for DOS calls routines to test if hardware floating-point instructions are available. If they are not, then slower software emulation code is used. When -fastfp is specified, the compiled code will assume the existence of hardware floating-point. This can cause floating-point intensive programs to run about twice as fast, but they will fail to run at all on old 486's and 386's that are lacking hardware floating-point support. With -fastfp, emake.bat chooses faster WATCOM C compiler options, and emake.bat must also link in ecfastfp.lib instead of ec.lib. On all other platforms, Euphoria uses fast hardware floating-point instructions, and the operating system handles the case where hardware f.p. is missing.
Simply by adding -dll to the command line, the Translator will build a Windows .dll (Linux/FreeBSD .so) file instead of an executable program. You can translate and compile a set of useful Euphoria routines, and share them with other people, without giving them your source. Furthermore, your routines will likely run much faster when translated and compiled. Both translated/compiled and interpreted programs will be able to use your library. Only the global Euphoria procedures and functions, i.e. those declared with the "global" keyword, will be exported from the .dll (.so). Any Euphoria program, whether translated/compiled or interpreted, can link with a Euphoria .dll (.so) using the same mechanism that lets you link with a .dll (.so) written in C. The program first calls open_dll() to open the .dll or .so file, then it calls define_c_func() or define_c_proc() for any routines that it wants to call. It calls these routines using c_func() and c_proc(). See library.doc for the details. The routine names exported from a Euphoria .dll will vary depending on which C compiler you use. GNU C on Linux or FreeBSD exports the names exactly as they appear in the C code produced by the Translator, e.g. a Euphoria routine global procedure foo(integer x, integer y)would be exported as "_0foo" or maybe "_1foo" etc. The underscore and digit are added to prevent naming conflicts. The digit refers to the Euphoria file where the symbol is defined. The main file is numbered as 0. The include files are numbered in the order they are encountered by the compiler. You should check the C source to be sure. Lcc would export foo() as "__0foo@8", where 8 is the number of parameters (2) times 4. You can check the .def file created by the Translator to see all the exported names. For Borland the Translator also creates a .def file, but this .def file renames the exported symbols back into the same names that you used in your Euphoria source, so foo() would be exported as "foo". For Watcom the same renaming as with Borland occurs, but instead of a .def file, an EXPORT command is added to objfiles.lnk for each exported symbol. With Borland and Watcom you can edit the .def or objfiles.lnk file, and rerun emake.bat, to rename the exported symbols, or remove ones that you don't want to export. With Lcc you can remove symbols but you can't rename them. Having nice exported names is not critical, since the name need only appear once in each Euphoria program that uses the .dll, i.e. in a single define_c_func() or define_c_proc() statement. The author of a .dll should probably provide his users with a Euphoria include file containing the necessary define_c_func() and define_c_proc() statements, and he might even provide a set of Euphoria "wrapper" routines to call the routines in the .dll. When you call open_dll(), any top-level Euphoria statements in the .dll or .so will be executed automatically, just like a normal program. This gives the library a chance to initialize its data structures prior to the first call to a library routine. For many libraries no initialization is required. To pass Euphoria data (atoms and sequences) as arguments, or to receive a Euphoria object as a result, you will need to use the following constants in euphoria\include\dll.e: -- Euphoria types for .dll (.so) arguments and return values: global constant E_INTEGER = #06000004, E_ATOM = #07000004, E_SEQUENCE= #08000004, E_OBJECT = #09000004Use these in define_c_proc() and define_c_func() just as you currently use C_INT, C_UINT etc. to call C .dll's and .so's. Currently, file numbers returned by open(), and routine id's returned by routine_id(), can be passed and returned, but the library and the main program each have their own separate ideas of what these numbers mean. Instead of passing the file number of an open file, you could instead pass the file name and let the .dll (.so) open it. Unfortunately there is no simple solution for passing routine id's. This might be fixed in the future. A Euphoria .dll or .so currently may not execute any multitasking operations. The Translator will give you an error message about this. Euphoria .dlls (.so's) can also be used by C programs as long as only 31-bit integer values are exchanged. If a 32-bit pointer or integer must be passed, and you have the source to the C program, you could pass the value in two separate 16-bit integer arguments (upper 16 bits and lower 16 bits), and then combine the values in the Euphoria routine into the desired 32-bit atom.
On DOS32 with Watcom, if the Translator finds the CauseWay files, cwc.exe and le23p.exe in euphoria\bin, it will add commands to emake.bat that will compress your executable file. If you don't want compression, you can edit emake.bat, or remove or rename cwc.exe and/or le23p.exe. On Linux, FreeBSD, Windows, and DOS32 with DJGPP, emake does not include a command to compress your executable file. If you want to do this we suggest you try the free UPX compressor. You can get UPX from: http://upx.sourceforge.net Large Win32Lib-based .exe's produced by the Translator can be compressed by UPX to about 15% of their original size, and you won't notice any difference in start-up time. The Translator deletes routines that are not used, including those from the standard Euphoria include files. After deleting unused routines, it checks again for more routines that have now become unused, and so on. This can make a big difference, especially with Win32Lib-based programs where a large file is included, but many of the included routines are not used in a given program. Nevertheless, your compiled executable file will likely be larger than the same Euphoria program bound with the interpreter back-end. This is partly due to the back-end being a compressed executable. Also, Euphoria statements are extremely compact when stored in a bound file. They need more space after being translated to C, and compiled into machine code. Future versions of the Translator will produce faster and smaller executables. All Euphoria programs can be translated to C, and with just a few exceptions noted below, will run the same as with the Interpreter (but hopefully faster). The Interpreter and Translator share the same parser, so you will get the same syntax errors, variable not declared errors etc. with either one. The Interpreter automatically expands the call stack (until memory is exhausted), so you can have a huge number of levels of nested calls. Most C compilers, on most systems, have a pre-set limit on the size of the stack. Consult your compiler or linker manual if you want to increase the limit, for example if you have a recursive routine that might need thousands of levels of recursion. Modify the link command in emake.bat. For Watcom C, use OPTION STACK=nnnn, where nnnn is the number of bytes of stack space. Note:
You should debug your program with the Interpreter. The Translator checks for certain run-time errors, but in the interest of speed, most are not checked. When translated C code crashes you'll typically get a very cryptic machine exception. In most cases, the first thing you should do is run your program with the Interpreter, using the same inputs, and preferably with type_check turned on. If the error only shows up in translated code, you can use with trace and trace(3) to get a ctrace.out file showing a circular buffer of the last 500 Euphoria statements executed. If a translator-detected error message is displayed (and stored in ex.err), you will also see the offending line of Euphoria source whenever with trace is in effect. with trace will slow your program down, and the slowdown can be extreme when trace(3) is also in effect. As far as RDS is concerned, any executable programs or .dll's that you create with this Translator without modifying an RDS translator library file, may be distributed royalty-free. You are free to incorporate any Euphoria files provided by RDS into your application. In January 2000, the CauseWay DOS extender was donated to the public domain by Michael Devore. He has surrendered his copyright, and encourages anyone to use it freely, including for commercial use. In general, if you wish to use Euphoria code written by 3rd parties, you had better honor any restrictions that apply. If in doubt, you should ask for permission. On Linux, FreeBSD and DJGPP for DOS32, the GNU Library licence will normally not affect programs created with this Translator. Simply compiling with GNU C does not give the Free Software Foundation any jurisdiction over your program. If you statically link their libraries you will be subject to their Library licence, but the standard compile/link procedure in emake does not statically link any FSF libraries, so there should be no problem. The ncurses library is the only one statically linked, and although the Free Software Foundation now holds the copyright, ncurses is not subject to the GNU Library licence, since it was donated to FSF by authors who did not wish the GNU licence to apply to it. See ncurses.h for the copyright notice. The Allegro graphics library, used by DJGPP, is referred to as "Giftware" in their documentation, and they allow you to redistribute it as part of your program. They ask for, but do not require, some acknowledgement. Disclaimer:
The various C compilers are not equal in optimization ability.
Watcom, GNU C and DJGPP produce the fastest code. Borland is fairly good.
Lcc lags slightly behind the others, even when its -O flag is used.
Borland compiles the fastest. Watcom compiles the slowest.
Typical user-defined types will not slow you down. Since your program
is supposed to be free of type_check errors, types are ignored by
the Translator, unless you call them directly with normal
function calls. The one exception is when a
user-defined type routine has side-effects (i.e. it sets a global variable,
performs pokes into memory, I/O etc.). In that case, if
with type_check is in effect, the
Translator will issue code to
call the type routine and report any type_check failure that results.
On Windows and DOS we have left out the /ol loop optimization
for Watcom's wcc386. We found in a couple of rare cases that this option
led to incorrect machine code being emitted by the Watcom C compiler.
If you add it back in to your own version of emake.bat you might
get a slight improvement in speed, with a slight risk of buggy code.
For DJGPP you might try -O6 instead of -O2.
For DOS we use the Watcom /fpc option which generates calls to run-time
routines to perform floating-point operations. If the machine has
floating-point hardware it will be used by the routine, otherwise
software emulation will be used. This slows things down somewhat,
and isn't needed on Pentiums, but it guarantees that your program
will run on all 386 and 486 machines, even if they lack floating-point
hardware. The DOS run-time library,
ec.lib, was built this way, so
you can't simply remove this option.
On Linux or FreeBSD you could try the O3 option of gcc instead of O2.
It will
"in-line" small routines, improving speed slightly, but creating a
larger executable. You could also try the
Intel C++ Compiler for Linux. It's compatible with GNU C, but
some adjustments to emake might be required. Many large programs have been successfully translated and compiled using each of the supported C compilers, and the Translator is now quite stable. Note:
In some cases a huge Euphoria routine is translated to C, and it proves to be too large for the C compiler to process. If you run into this problem, make your Euphoria routine smaller and simpler. You can also try turning off C optimization in emake.bat for just the .c file that fails. Breaking up a single constant declaration of many variables into separate constant declarations of a single variable each, may also help. Euphoria has no limits on the size of a routine, or the size of a file, but most C compilers do. The Translator will automatically produce multiple small .c files from a large Euphoria file to avoid stressing the C compiler. It won't however, break a large routine into smaller routines.
Post bug reports on EUforum.
|