EEBAX

Euphoria Event Based API for XML
Copyright (c) Thomas Parslow 2002
tom@almostobsolete.net

1 Introduction

2 Usage

2.1 Parsing

2.2 XML Creation

3 Reference

3.1 Events

StartDocument

EndDocument

StartElement

EndElement

Characters

Comment

ParseError

IgnorableWhitespace

ProcessingInstruction

StartPrefixMapping

EndPrefixMapping

XML

3.2 General

eebax_NewInstance

eebax_DestroyInstance

eebax_ResetInstance

eebax_StartDocument

eebax_EndDocument

3.3 Parsing

eebax_Parse

3.4 XML Creation

eebax_StartElement

eebax_EndElement

eebax_Characters

eebax_ProcessingInstruction

eebax_Comment

3.4 Utilities

eebax_EncodeUTF8

eebax_DecodeUTF8

eebax_EncodeUTF16

eebax_DecodeUTF16

1 Introduction

EEBAX is my second attempt at an XML parser, my first was xml.e (available on the Euphoria archive page) which was not very good at all (it was slow, not at all compliant ect), EEBAX should be a lot better :)

EEBAX is an event based parser, this means that for each bit of XML it encounters (for example a opening or closing tag or some character data) it will generate an event which the calling application can proccess. This is a lot more flexible than loading the entire XML document into a structure in memory (or a sequence) and returing that. The application can build it's own structures using the events (or use xmltree.e).

EEBAX can also be used to create XML (using the onXML event)

2 Usage

2.1 Parsing

See example1.ex.

To use EEBAX you must first create a parser instance, you can do this with the eebax_NewInstance function:

    atom eebax
    eebax = eebax_NewInstance()

Each instance can only proccess one document at a time but you can create as many instances as you need.

Now add some event handlers for the events you wish to proccess (see below for a full list of availble events), these will be called as the document is parsed:

    procedure onStartElement(integer hInst,sequence Uri, sequence LocalName, sequence QName, sequence Atts)
        puts(1,"Start Element: " & QName & "\n")
    end procedure

    procedure onEndElement(integer hInst,sequence Uri, sequence LocalName, sequence QName)
        puts(1,"End Element: " & QName & "\n")
    end procedure

    procedure onCharacters(integer hInst,sequence Chars)
        puts(1,"Character Data: " & Chars & "\n")
    end procedure

Next assign the events to the apropriate handlers:

    eebax_SetStartElementEvent(eebax,routine_id("onStartElement"))
    eebax_SetEndElementEvent(eebax,routine_id("onEndElement"))
    eebax_SetCharactersEvent(eebax,routine_id("onCharacters"))

When you are ready to start proccessing an XML document call the eebax_StartDocument procedure, this prepares the instance to recieve a document:

    eebax_StartDocument(eebax)

Next call eebax_Parse one or more times until you have passed the whole document to it, as you call eebax_Parse the events you definded above will be called appropriately. eebax_Parse will normaly return 1 but if there is a parser error it will return 0 (in addition to calling the Error event if it is handled).

    atom fn
    sequence in

    fn = open("test.xml","rb")
    if fn = -1 then
        puts(1,"Unable to open test.xml")
        abort(1)
    end if
    while 1 do
        -- We're reading in one kilobyte at a time from the file
        in = get_bytes(fn,1024)
        if not eebax_Parse(eebax,in) then
            puts(1,"XML document invalid")
            abort(1)
        end if
        if length(in) < 1024 then
            exit
        end if
    end while
    -- Close the file
    close(fn)

When you have passed the whole document to eebax_Parse call eebax_EndDocument, the EEBAX instance will be set back to the state it was when you first created it and can be used to parse further XML documents.

    eebax_EndDocument(eebax)

Finnaly, when you no longer have any use for the parser instance you should destroy it with eebax_DestroyInstance:

    eebax_DestroyInstance(eebax)

This will free up any resources used.

2.2 XML Creation

See example2.ex

To have EEBAX create XML for you first create an instance of the parser as you would when for parsing a document, then create a handler to handle the outputed XML:

    atom eebax
    eebax = eebax_NewInstance()

    -- All this example handler does is print the XML to the console,
    -- hovever it could be written to do something else with it
    procedure onXML(integer hInst, sequence XML)
        puts(1,XML)
    end procedure

    eebax_SetXMLEvent(eebax,routine_id("onXML"))
Note that the XML event will be called multiple times for a single document.

Call the eebax_StartDocument to start a new XML document then call XML creation routines until your document is finished. (see below for a complete list of XML creation routines).

        eebax_StartDocument(eebax)
        -- Start a non-empty (must have closing tag) element
        eebax_StartElement(eebax,"toplevel",{{"attribute1","value1"},{"attribute2","value2"}},0)
        eebax_StartElement(eebax,"empty",{},1)
        eebax_StartElement(eebax,"secondlevel",{},0)
        eebax_Characters(eebax,"Character content")
        eebax_EndElement(eebax,"secondlevel")
        eebax_EndElement(eebax,"toplevel")
        eebax_EndDocument(eebax)

3 Reference

3.1 Events

For each of the events below there is a "eebax_Set[eventname]Event(integer hInst,integer RoutineID)" procedure.

StartDocument

onStartDocument(integer hInst)

This event is raised when a new document starts

EndDocument

onEndDocument(integer hInst)

This event is raised when a document ends

StartElement

onStartElement(integer hInst,sequence Uri, sequence LocalName, sequence QName, sequence Atts)

This event is raised when a new element starts. Atts is a sequence of attributes, for attribute x the following data can be retrieved:

Unless your using XML namespaces then you you should probably always use the QName (qualified name) and ignore the Uri and LocalName

EndElement

onEndElement(integer hInst,sequence Uri, sequence LocalName, sequence QName)

This event is raised when a element ends. Unless your using XML namespaces then you you should probably always use the QName (qualified name) and ignore the Uri and LocalName

Characters

onCharacters(integer hInst,sequence Chars)

This event is raised when character data is encountered

Comment

onComment(integer hInst,sequence Comment)

This event is raised when a comment is encountered.

ParseError

onParseError(integer hInst, integer ErrorNumber, sequence Description, integer LineNumber)

This event is raised when a parseing error occurs.

IgnorableWhitespace

onIgnorableWhitespace(integer hInst, sequence Whitespace)

This event is raised when white space is encountered between elements. If you don't want to distinguish between whitespace and other character data then you should assign this event to the same handler as the Characters event.

ProcessingInstruction

onProcessingInstruction(integer hInst,sequence Target, sequence Data)

This event is raised when a processing instruction is encountered.

StartPrefixMapping

onStartPrefixMapping(integer hInst,sequence Prefix, sequence Uri)

This event is raised when a new namespace prefix mapping comes into scope. If your not using XML namespaces (and most of the time even if you are) you should ignore this event.

EndPrefixMapping

onEndPrefixMapping(integer hInst,sequence Prefix, sequence Uri)

This event is raised when a namespace prefix mapping goes out of scope. If your not using XML namespaces (and most of the time even if you are) you should ignore this event.

XML

onXML(integer hInst, sequence XML)

This event is raised when ever a piece of XML is parsed or an XML creation routine is called.

3.2 General

eebax_NewInstance

function eebax_NewInstance()

Creates a new parser instance and returns a handle

See also: eebax_DestroyInstance(), eebax_ResetInstance()

eebax_DestroyInstance

procedure eebax_DestroyInstance(integer hInst)

Destroys the instance and all its data structures

See also: eebax_NewInstance(), eebax_ResetInstance()

eebax_ResetInstance

procedure eebax_ResetInstance(integer hInst)

Resets the instance so that it can be used again. Once reset the instance will behave exactly as if it had just been created with eebax_NewInstance except all the events that have been set will be left in tact.

See also: eebax_NewInstance(), eebax_DestroyInstance()

eebax_StartDocument

procedure eebax_StartDocument(integer hInst)

Starts a new document (if you have allready used this instance then you must have either used eebax_EndDocument() or eebax_ResetInstance()). Always call this before using eebax_Parse() or an of the XML creation routines.

See also: eebax_EndDocument()

eebax_EndDocument

procedure eebax_EndDocument(integer hInst)

Ends the current document. Should be called after a document has been parsed or created, will generate errors if the document is not complete.

See also: eebax_StartDocument()

3.3 Parsing

eebax_Parse

function eebax_Parse(integer hInst,sequence Data)

Parses XML Data and generates appropriate events. Returns 0 if an error occurs or 1 if no error occurs during the parsing of Data, use the ParseError event to get full error info. When reading in UTF-16 encoded documents you must use binary mode *NOT* text mode, UTF-8 *SHOULD* be ok with either.

3.4 XML Creation

eebax_StartElement

procedure eebax_StartElement(integer hInst, sequence name, sequence atts, integer empty)

Starts a new element

Namespace declarations can be supplied in the attribute list

See also: eebax_EndElement()

eebax_EndElement

procedure eebax_EndElement(integer hInst, sequence name)

Closes an element, name is a qualified name for the element being closed

See also: eebax_StartElement()

eebax_Characters

procedure eebax_Characters(integer hInst, sequence Chars)

Adds Chars to the current document as character data. Chars should be unescaped character data.

eebax_ProcessingInstruction

procedure eebax_ProcessingInstruction(integer hInst, sequence Target, sequence Data)

Generates a processing instruction

eebax_Comment

procedure eebax_Comment(integer hInst, sequence Comment)

Adds Comment to the current document as a comment.

3.5 Utilities

eebax_EncodeUTF8

function eebax_EncodeUTF8(sequence Data)

Encodes a unicode string where each character occupies one element of a sequence into a UTF-8 encoded string. If an error is encountered then an atom containing the position of the character that caused the error is returned. You do not need to use this function directly, it is called automatically from most of the XML creation routines.

See also: eebax_DecodeUTF8()

eebax_DecodeUTF8

function eebax_DecodeUTF8(sequence Data)

Decodes a UTF-8 encoded string so that each character occupies one element of a sequence. You do not need to use this function directly, it is called automatically from the xml_Parse routine.

See also: eebax_EncodeUTF8()

eebax_EncodeUTF16

function eebax_EncodeUTF16(sequence Data)

Encodes a unicode string where each character occupies one element of a sequence into a UTF-16 encoded string. If an error is encountered then an atom containing the position of the character that caused the error is returned. You do not need to use this function directly, it is called automatically from most of the XML creation routines.

See also: eebax_DecodeUTF16()

eebax_DecodeUTF16

function eebax_DecodeUTF16(sequence Data)

Decodes a UTF-16 encoded string so that each character occupies one element of a sequence. The first character of data should be the #FEFF marker (BOM). You do not need to use this function directly, it is called automatically from the xml_Parse routine.

See also: eebax_EncodeUTF16()