10 Think Euphoria

10.1 Files

Eu) back home


10.2 Caution

This chapter still has Python elements that may confuse you.

10.3 Persistence

Most of the programs we have seen so far are transient in the sense that they run for a short time and produce some output, but when they end, their data disappears. If you run the program again, it starts with a clean slate.

Other programs are persistent: they run for a long time (or all the time); they keep at least some of their data in permanent storage (a hard drive, for example); and if they shut down and restart, they pick up where they left off.

Examples of persistent programs are operating systems, which run pretty much whenever a computer is on, and web servers, which run all the time, waiting for requests to come in on the network.

One of the simplest ways for programs to maintain their data is by reading and writing text files. We have already seen programs that read text files; in this chapters we will see programs that write them.

An alternative is to store the state of the program in a database. In this chapter I will present the EDS database and "pickling", that makes it easy to store program data.

10.4 Reading and writing

A text file is a sequence of characters stored on a permanent medium like a hard drive, flash memory, or DVD. We saw how to open and read a file previously.

To write a file, you have to open it with mode ( "w" ) as a second argument:

    atom fout = open( "output.txt", "w" )
    ? fout
        -- 3  // some unique file number
If the file already exists, opening it in write mode clears out the old data and starts fresh, so be careful! If the file doesn't exist, a new one is created.

Rather than use ( 1 ) as an argument for output to the screen, use fout as the argument to direct output to the file.

Any of the output routines can be used to write data into the file.

    sequence line1 = "This here's the wattle,\n"
    puts(fout, line1 )
Again, the file object keeps track of where it is, so if you call write again, it adds the new data to the end.

    sequence line2 = "the emblem of our land.\n" )
    puts(fout, line2 )
When you are done writing, you have to close the file.

    close(fn)
The file is now ready for reading.



10.5 Output choices



For output of Euphoria objects you typically think of ( ? ) for quick output, print() for numbers and puts() for text.

procedure number text nested sequence
? yes number expanded nested
print() yes number compact nested
puts() char yes NO
printf() yes yes flat output
pretty_print() yes compound
yes
expanded nested
pp() yes yes compact nested

All Euphoria objects are numeric and organized as either atoms or sequences (one number or many numbers). But, you may display them as numbers or text.

value procedure selection
number ?
print()
text puts()
mixed printf() manual
pretty_print() default
pp() default
The ( ? ) is the quick way to see the numeric values of objects, their atom/sequence nature, and any nested organization in an expanded style. The output goes to the screen only.

All of the other routines let you choose the output device--screen or file..

Use print() to output the numeric values(s) of an object. Nested sequences are shown in a "compact" form on one line.

Use puts() to output a string to the screen or a file. All numeric value(s) will be converted to their closest character representation. Strings must be flat; no nested sequences allowed.

By choosing printf() you have full control of the output format. You may mix numeric and text output. You may even add more text and format the output. It is still your responsibility to ensure that nested sequences are parsed so they can be output as a single line.

The most versatile output routine is pretty_print(). In its default mode it shows numbers, characters as both numbers and characters ( i.e. 65'A', 66'B', etc), and nested sequences in an expanded style. Adding the argument ( {2} ) will cause strings to display as text, while still showing nested sequences in an expanded style. There are many other choices to fully customize the output.

The pp() routine may have the most convenient default choices showing numbers, strings, and nested sequences in a compact style.

10.6 Formated output

The argument of write has to be a string, so if we want to put other values in a file, we have to convert them to strings. The easiest way to do that is with str:

x = 52
f.write(str(x))

An alternative is to use the format operator, ( % ) is the format operator.

The first operand is the format string, and the second operand is a tuple of expressions. The result is a string that contains the values of the expressions, formatted according to the format string.

As an example, the format sequence ( %d ) means that the first expression in the tuple should be formatted as an integer (d stands for ?decimal?):

    atom camels = 42
    printf(fout, "%d", camels )
        -- 42

The result is the string "42", which is not to be confused with the integer value 42. A format sequence can appear anywhere in the format string, so you can embed a value in a sentence:

    atom camels = 42
    printf(fout, "I have spotted %d camels.", camels )
        -- I have spotted 42 camels.



The format sequence //(// %g //)// formats the next element 
!! in the tuple as a 
in a "general" numeric format
!! floating-point number (don't ask why), 
and //(// %s //)// formats the next item as a string:

<eucode>
    printf(fout, "In %g years I have spotted %g %s.", { 3, 0.1, "camels" } )
        -- In 3 years I have spotted 0.1 camels.

The number of elements in the argument has to match the number of format sequences in the string. Also, the types of the elements have to match the format sequences:

    printf(fout, "%d %d %d", {1,2} )
        -- error

    printf(fout, "%d", "dollars" )
        -- error, you wanted to use %s

In the first example, there aren't enough elements; in the second, the element is the wrong type.

The format operator is powerful but difficult to use.

10.7 Filenames and paths

Files are organized into directories (also called "folders"). Every running program has a "current working directory," which is the default directory for most operations. For example, when you open a file for reading, Euphoria looks for it in the current working directory.

The filesys.e include file provides rotuines for working with files and directories

    include filesys.e
    sequence cwd = current_dir()
    puts(1, cwd )
        -- /home/dinsdale

cwd stands for "current working directory." The result in this example is /home/dinsdale, which is the home directory of a user named dinsdale.

A string like cwd that identifies a file is called a path. A relative path starts from the current directory; an absolute path starts from the topmost directory in the file system.

The paths we have seen so far are simple filenames, so they are relative to the current directory. To find the absolute path to a file, you can use pathinfo():

    include std/filesys.e
    include print.e
    sequence info = pathinfo( current_dir() )
    pp( info )
        -- /home/dinsdale/memo.txt

file_exists() checks whether a file or directory exists:

    include std/filesys.e
    ? file_exists( "memo.txt" )
        -- 1 for yes
        -- 0 or no

You may use this routine to determine if a directory exists:

    include std/filesys.e
    ? file_exists( "/root" )
        -- 1
If it exists, file_type() tells you whether it's a directory:

    include std/filesys.e
    ? file_type( "/root" )
        -- 2
The return values are: ( -1 ) undefined, ( 0 ) not found, ( 1 ) its a file, and ( 2 ) its a directory.

dir() returns complete information for the contents of a directory:

    include std/filesys.e
    object d = dir( current_dir() )
    pp(1, d )
        --
    for i=1 to length(d) do
        pp(1, d[i][1] & "\n" )
    end for
        -- "."
        -- ".."
        -- "01-cwd.ex"
        -- "print.e"

To demonstrate these functions, the following example "walks" through a directory, prints the names of all the files, and calls itself recursively on all the directories.

def walk(dir): for name in os.listdir(dir): path = os.path.join(dir, name)

if os.path.isfile(path): print path else: walk(path)

os.path.join takes a directory and a file name and joins them into a complete path.

10.8 Catching errors

A lot of things can go wrong when you try to read and write files. Reading or writing a non-existant file is an error.

    atom fn = open( "badfile.txt", "r" )
        --error message:
    -- badfile number (-1)
If you try to output to a file that does not exist:

    puts(4, "can't write this" )
        -- error message:
    -- file number 4 is not open
If you try to open a file that doesn't exist, you get:

    atom fn = open( "badfile.ex", "r" )
    ? fn
        -- -1

And if you try to open a directory for reading, you get

    atom fn = open( "/root", "r" )
        -- bad file number (-1)

To avoid errors like this you should test for the existance of a file or directory before you try using it.

    atom fn = open( "myfile.txt", "r" )
    if fn=-1 then
        puts(1, "file does not exist, try again..." )
    else
        puts(1, "ok" )
        -- continue with your program
    end if

10.9 Databases

A database is a file that is organized for storing data. Most databases are organized like a "dictionary" in the sense that they map from keys to values. The biggest difference is that the database is on disk (or other permanent storage), so it persists after the program ends.

The include file eds.e provides a database system written in Euphoria. As a result this is an extremely flexible system. It provides a an interface for creating and updating database files.

As an example, I'll create a database that contains captions for image files.

If a database does not exist yet, you will have to create one using db_create():

    include std/eds.e
    if db_create( "mydata", DB_LOCK_NO ) != DB_OK then
        puts(1, "Couldn't create the database" )
        end if
Opening a database is similar to opening other files:

    include std/eds.e
    if db_open( "mydata", DB_LOCK_NO ) != DB_OK then
        puts(1, "Couldn't open the database" )
        end if
db['cleese.png'] = 'Photo of John Cleese.'

When you access one of the items, anydbm reads the file:

print db['cleese.png'] Photo of John Cleese.

If you make another assignment to an existing key, anydbm replaces the old value:

db['cleese.png'] = 'Photo of John Cleese doing a silly walk.'
print db['cleese.png']

Photo of John Cleese doing a silly walk.

Many dictionary methods, like keys and items, also work with database objects. So does iteration with a for statement.

for key in db: print key

As with other files, you should close the database when you are done:

db.close()

10.10 euSQLite

Euphoria can interface to an external database such as SQLite. SQLite is a database server written in C; it is small, fast, and not constrained by data-types. Ray Smith has written a wrapper that allows Euphoria to operate the SQLite server. You may find euSQLite at Sourceforge.

The following is taken from the euSQLite documentation:

Basic structure

This is the skeleton of a typical euSQLite program.

    include eusqlite.ew
 
    atom db
    sequence data
 
    db = sqlite_open("filename_of_your_database",0)
 
        -- ... do some processing / user input etc
 
    data = sqlite_get_table(db, "{SQL statements go here}")
    if sqlite_last_err_no != SQLITE_OK then
        -- ... do some error processing
    end if
 
    --... do some more processing / user input etc
 
    sqlite_close(db)

It basically comes down to:

 
    Include                -- include euSQLite.ew
 
    Open your database     -- sqlite_open()
 
    Execute SQL statements -- sqlite_get_table()
 
    Close your database    -- sqlite_close()

The authors of SQLite make the claim that it is the "most widely distributed" database.

10.11 Pickling

The Euphoria database is powerful and versatile. Keys and values may be any Euphoria object. Though, it may be too elaborate for the simplest database needs.

The conventional approach is to convert your data into text form and save it in a text file. You then have to write a parsing program that reads the text file and recovers the information. Proprietary programs favor saving data in proprietary binary formats--a very unfriendly strategy.

Euphoria may save or read data in either text or binary formats. You may therefore access the data stored by any non-Euphoria program. For this you will need to know how the data is structured.

For the simplest needs it may be enought to read and write Euphoria objects to a file. In effect you invent your own database and make it as simple as needed. We will call this pickling . You can use get() to input the next human readable Euphoria object from a device. Objects are converted into their numeric value. You may use gets() to read the next string (that may include the ( \n ) terminator) from a device. Use ( 0 ) for keyboard input and an open file number to read from text file.

Several output procedures will save objects in a human readable form in a text file. Use print() for numbers, use puts() for strings, and use pretty_print() or pp() for objects containing mixed numbers and strings.

10.12 Pipes

Most operating systems provide a command-line interface, also known as a shell. Shells usually provide commands to navigate the file system and launch applications. For example, in Unix, you can change directories with cd, display the contents of a directory with ls, and launch a web browser by typing (for example) firefox.

Any program that you can launch from the shell can also be launched from Euphoria using a pipe. A pipe is an object that represents a running process.

For example, the Unix command ls -l normally displays the contents of the current directory (in long format). You can launch ls with

cmd = 'ls -l'
fp = os.popen(cmd)

The argument is a string that contains a shell command. The return value is a file pointer that behaves just like an open file. You can read the output from the ls process one line at a time with readline or get the whole thing at once with read:

res = fp.read()

When you are done, you close the pipe like a file:

stat = fp.close()
print stat None

The return value is the final status of the ls process; None means that it ended normally (with no errors).

A common use of pipes is to read a compressed file incrementally; that is, without uncompressing the whole thing at once. The following function takes the name of a compressed file as a argument and returns a pipe that uses gzip to decompress the contents:

def open_gzip(filename):

cmd = 'gunzip -c ' + filename fp = os.popen(cmd) return fp

If you read lines from fp one at a time, you never have to store the uncompressed file in memory or on disk.

10.13 Writing include files

Any file that contains Euphoria code can be included as a file. For example, suppose you have a file named wc.e with the following code:

    include std/io.e
	export function linecount( sequence filename )
        atom fh = open( filename, "r" )        
        sequence data = read_lines( fh )
        close(fn)
        return length( data )
        end function

    include wc.e
    print(1, linecount( "wc.e" )
If you run this program, it reads itself and prints the number of lines in the file, which is 7. You can also import it like this:

If you do not #export# the routine it remains invisible to the calling file.

So that's how you write include files in Euphoria.

Note: If you include a file that has already been included, Euphoria does nothing. It does not re-read the file, even if it has changed.

10.14 File input/output

You may read an entire file all at once using the procedure read_lines():

 
    include std/io.e
    atom fn = open( "myfile.txt", "r" )
    object data = read_lines( fn )
    close(fn)
    ? data
The entire file is contained in one object, consisting several strings of text--one string per line.

Similarily, the function write_lines() will write a sequence of strings to a file.

    include std/io.e
    atom fn = open( "newfile.txt", "w" )
    if write_lines(fn, data ) then
        puts(1, "success" )
    else
        puts(1, "could not write data" )
    end if
    close(fn)

10.15 Debugging

When you are reading and writing files, you might run into problems with whitespace. These errors can be hard to debug because spaces, tabs and newlines are normally invisible:

s = '1 2\t 3\n 4'
print s 1 2 3 4

The built-in function repr can help. It takes any object as an argument and returns a string representation of the object. For strings, it represents whitespace characters with backslash sequences:

print repr(s) '1 2\t 3\n 4'

This can be helpful for debugging.

One other problem you might run into is that different systems use different characters to indicate the end of a line. Some systems use a newline, represented \n. Others use a return character, represented \r. Some use both. If you move files between different systems, these inconsistencies might cause problems.

For most systems, there are applications to convert from one format to another. You can find them (and read more about this issue) at wikipedia.org/wiki/Newline. Or, of course, you could write one yourself.


back home


10.16 Glossary