EEBAX is my second attempt at an XML parser, my first was xml.e (available on the Euphoria archive page) which was not very good at all (it was slow, not at all compliant ect), EEBAX should be a lot better :)
EEBAX is an event based parser, this means that for each bit of XML it encounters (for example a opening or closing tag or some character data) it will generate an event which the calling application can proccess. This is a lot more flexible than loading the entire XML document into a structure in memory (or a sequence) and returing that. The application can build it's own structures using the events (or use xmltree.e).
EEBAX can also be used to create XML (using the onXML event)
See example1.ex.
To use EEBAX you must first create a parser instance, you can do this with the eebax_NewInstance function:
atom eebax eebax = eebax_NewInstance()
Each instance can only proccess one document at a time but you can create as many instances as you need.
Now add some event handlers for the events you wish to proccess (see below for a full list of availble events), these will be called as the document is parsed:
procedure onStartElement(integer hInst,sequence Uri, sequence LocalName, sequence QName, sequence Atts) puts(1,"Start Element: " & QName & "\n") end procedure procedure onEndElement(integer hInst,sequence Uri, sequence LocalName, sequence QName) puts(1,"End Element: " & QName & "\n") end procedure procedure onCharacters(integer hInst,sequence Chars) puts(1,"Character Data: " & Chars & "\n") end procedure
Next assign the events to the apropriate handlers:
eebax_SetStartElementEvent(eebax,routine_id("onStartElement")) eebax_SetEndElementEvent(eebax,routine_id("onEndElement")) eebax_SetCharactersEvent(eebax,routine_id("onCharacters"))
When you are ready to start proccessing an XML document call the eebax_StartDocument procedure, this prepares the instance to recieve a document:
eebax_StartDocument(eebax)
Next call eebax_Parse one or more times until you have passed the whole document to it, as you call eebax_Parse the events you definded above will be called appropriately. eebax_Parse will normaly return 1 but if there is a parser error it will return 0 (in addition to calling the Error event if it is handled).
atom fn sequence in fn = open("test.xml","rb") if fn = -1 then puts(1,"Unable to open test.xml") abort(1) end if while 1 do -- We're reading in one kilobyte at a time from the file in = get_bytes(fn,1024) if not eebax_Parse(eebax,in) then puts(1,"XML document invalid") abort(1) end if if length(in) < 1024 then exit end if end while -- Close the file close(fn)
When you have passed the whole document to eebax_Parse call eebax_EndDocument, the EEBAX instance will be set back to the state it was when you first created it and can be used to parse further XML documents.
eebax_EndDocument(eebax)
Finnaly, when you no longer have any use for the parser instance you should destroy it with eebax_DestroyInstance:
eebax_DestroyInstance(eebax)
This will free up any resources used.
See example2.ex
To have EEBAX create XML for you first create an instance of the parser as you would when for parsing a document, then create a handler to handle the outputed XML:
atom eebax eebax = eebax_NewInstance() -- All this example handler does is print the XML to the console, -- hovever it could be written to do something else with it procedure onXML(integer hInst, sequence XML) puts(1,XML) end procedure eebax_SetXMLEvent(eebax,routine_id("onXML"))Note that the XML event will be called multiple times for a single document.
Call the eebax_StartDocument to start a new XML document then call XML creation routines until your document is finished. (see below for a complete list of XML creation routines).
eebax_StartDocument(eebax) -- Start a non-empty (must have closing tag) element eebax_StartElement(eebax,"toplevel",{{"attribute1","value1"},{"attribute2","value2"}},0) eebax_StartElement(eebax,"empty",{},1) eebax_StartElement(eebax,"secondlevel",{},0) eebax_Characters(eebax,"Character content") eebax_EndElement(eebax,"secondlevel") eebax_EndElement(eebax,"toplevel") eebax_EndDocument(eebax)
For each of the events below there is a "eebax_Set[eventname]Event(integer hInst,integer RoutineID)
" procedure.
StartDocument
onStartDocument(integer hInst)
This event is raised when a new document starts
onEndDocument(integer hInst)
This event is raised when a document ends
onStartElement(integer hInst,sequence Uri, sequence LocalName, sequence QName, sequence Atts)
This event is raised when a new element starts. Atts is a sequence of attributes, for attribute x the following data can be retrieved:
Atts[x][EEBAX_ATTS_QNAME]
- The qualified name of the attributeAtts[x][EEBAX_ATTS_VALUE]
- The value of the attributeAtts[x][EEBAX_ATTS_URI]
- The URI for the attributeAtts[x][EEBAX_ATTS_LOCALNAME]
- The LocalName for the attributeUnless your using XML namespaces then you you should probably always use the QName (qualified name) and ignore the Uri and LocalName
onEndElement(integer hInst,sequence Uri, sequence LocalName, sequence QName)
This event is raised when a element ends. Unless your using XML namespaces then you you should probably always use the QName (qualified name) and ignore the Uri and LocalName
Characters
onCharacters(integer hInst,sequence Chars)
This event is raised when character data is encountered
onComment(integer hInst,sequence Comment)
This event is raised when a comment is encountered.
onParseError(integer hInst, integer ErrorNumber, sequence Description, integer LineNumber)
This event is raised when a parseing error occurs.
ErrorNumber
- numerical error codeDescription
- a textual description that could be shown to a userLineNumber
- the current line number, can be used for error reportingonIgnorableWhitespace(integer hInst, sequence Whitespace)
This event is raised when white space is encountered between elements. If you don't want to distinguish between whitespace and other character data then you should assign this event to the same handler as the Characters event.
onProcessingInstruction(integer hInst,sequence Target, sequence Data)
This event is raised when a processing instruction is encountered.
onStartPrefixMapping(integer hInst,sequence Prefix, sequence Uri)
This event is raised when a new namespace prefix mapping comes into scope. If your not using XML namespaces (and most of the time even if you are) you should ignore this event.
onEndPrefixMapping(integer hInst,sequence Prefix, sequence Uri)
This event is raised when a namespace prefix mapping goes out of scope. If your not using XML namespaces (and most of the time even if you are) you should ignore this event.
onXML(integer hInst, sequence XML)
This event is raised when ever a piece of XML is parsed or an XML creation routine is called.
function eebax_NewInstance()
Creates a new parser instance and returns a handle
See also: eebax_DestroyInstance()
, eebax_ResetInstance()
procedure eebax_DestroyInstance(integer hInst)
Destroys the instance and all its data structures
See also: eebax_NewInstance()
, eebax_ResetInstance()
procedure eebax_ResetInstance(integer hInst)
Resets the instance so that it can be used again. Once reset the instance will behave exactly as if it had just been created with eebax_NewInstance except all the events that have been set will be left in tact.
See also: eebax_NewInstance()
, eebax_DestroyInstance()
procedure eebax_StartDocument(integer hInst)
Starts a new document (if you have allready used this instance then you must have either used eebax_EndDocument() or eebax_ResetInstance()). Always call this before using eebax_Parse() or an of the XML creation routines.
See also: eebax_EndDocument()
procedure eebax_EndDocument(integer hInst)
Ends the current document. Should be called after a document has been parsed or created, will generate errors if the document is not complete.
See also: eebax_StartDocument()
function eebax_Parse(integer hInst,sequence Data)
Parses XML Data and generates appropriate events. Returns 0 if an error occurs or 1 if no error occurs during the parsing of Data, use the ParseError event to get full error info. When reading in UTF-16 encoded documents you must use binary mode *NOT* text mode, UTF-8 *SHOULD* be ok with either.
procedure eebax_StartElement(integer hInst, sequence name, sequence atts, integer empty)
Starts a new element
name
- a qualified name for the new elementatts
- a list of attributes in the form of a sequence of {name,value} sequences were name is a qualified nameempty
- a boolean value to indicate if the tag is empty (in which case it will be output in the form <name atts/>), if the empty flag is true then eebax_EndElement should not and can not be called for this elementNamespace declarations can be supplied in the attribute list
See also: eebax_EndElement()
procedure eebax_EndElement(integer hInst, sequence name)
Closes an element, name is a qualified name for the element being closed
See also: eebax_StartElement()
procedure eebax_Characters(integer hInst, sequence Chars)
Adds Chars to the current document as character data. Chars should be unescaped character data.
procedure eebax_ProcessingInstruction(integer hInst, sequence Target, sequence Data)
Generates a processing instruction
procedure eebax_Comment(integer hInst, sequence Comment)
Adds Comment to the current document as a comment.
function eebax_EncodeUTF8(sequence Data)
Encodes a unicode string where each character occupies one element of a sequence into a UTF-8 encoded string. If an error is encountered then an atom containing the position of the character that caused the error is returned. You do not need to use this function directly, it is called automatically from most of the XML creation routines.
See also: eebax_DecodeUTF8()
function eebax_DecodeUTF8(sequence Data)
Decodes a UTF-8 encoded string so that each character occupies one element of a sequence. You do not need to use this function directly, it is called automatically from the xml_Parse routine.
See also: eebax_EncodeUTF8()
function eebax_EncodeUTF16(sequence Data)
Encodes a unicode string where each character occupies one element of a sequence into a UTF-16 encoded string. If an error is encountered then an atom containing the position of the character that caused the error is returned. You do not need to use this function directly, it is called automatically from most of the XML creation routines.
See also: eebax_DecodeUTF16()
function eebax_DecodeUTF16(sequence Data)
Decodes a UTF-16 encoded string so that each character occupies one element of a sequence. The first character of data should be the #FEFF marker (BOM). You do not need to use this function directly, it is called automatically from the xml_Parse routine.
See also: eebax_EncodeUTF16()