Module XML DOM

The W4 project

 

 

(c) Carlos Viegas Damásio, November 2003

 

1. Description
This module supports the construction of the W4 XML term representation, according to XML Info Sets. XML Namespaces and XML Base are fully supported.

The fundamental implementations decisions are the following:

  • All names occurring in a XML document are represented by Prolog constants, in UTF-8 encoding.
  • All text content and attribute values are represented by lists of Unicode character codes.
  • The properties which involve referencing other information items are not implemented ( [parent], [references], [notation] and [owner element]). This is motivated by the fact that some Prolog systems do not give support to cyclic terms.
    However, it is planned an extension to this module conformant with  the full recommendation.
  • XML Base and SystemID URIs (and IRIs) have a special internal structure, as defined in module IRI, in order to optimize resolution of relative references.
  • In most situations unknown or no value properties are represented by empty lists.

In order to maintain compatibility for subsequent versions of our parser, all applications should use the described predicates to extract properties from the information items. For examples, the reader is referred to the implementation of XML Exclusive Canonicalization and XML Term NS.

 

2. Representation of a XML document

The W4 parser creates a term structure containing the full representation of the read XML document. This information is stored in a term of the form document/10, from which every information item in the XML is accessible. The constructed representation differs in some minor aspects from the XML Info Sets, in particular by constructing an internal representation for the DTD. The documentation is adapted from the specification of XML Info Sets. In the following tables, text in bold face represents items or properties according to the XML Info Sets.

2.1 The document and its content

The Document Information Item
document(Children,DocumentElement,Notations,Unparsed,BaseURI,CharacterEncoding,Standalone,Version,All,DTD)
Children: An ordered NodeList of child information items, in document order. There is only one element information item in this list (the Document Element). This list also contains all processing instruction items, comment items, and Document Type Declaration item occuring in the Prolog and Epilog of the XML document.
DocumentElement: The element information item corresponding to the document element.
Notations: An ordered NamedMap of notation information items, one for each notation declared in the DTD.
The ordering key is the notation name.
Unparsed: An ordered NamedMap of unparsed entity information items, one for each unparsed entity declared in the DTD.
The ordering key is the name of the unparsed entity.
BaseURI: The base URI term of the document entity, according to the IRI term representation.
CharacterEncoding: A constant with the name of the character encoding scheme in which the document entity is expressed
Standalone: An indication of the standalone status of the document, either the constant yes or no.
If there is no standalone document declaration, then this argument is set to the empty list.
Version: A constant representing the XML version of the document (currently, only '1.0').
The empty list [] if there is no XML declaration.
All: The constant yes or no indicating whether the processor has read the complete DTD.
The empty list [] if there is no DTD.
DTD: The Document Type Declaration term representing the full DTD.
The empty list [] if there is no DTD.


 

Element Information Items
element(NamespaceURI, LocalName, Prefix, Attributes, NameAttributes, Children, InScope, BaseURI, Lang)
NamespaceURI: A constant with the namespace name, if any, of the element type.
The empty constant '' if
the element does not belong to a namespace.
LocalName: A constant representing the local part of the element-type name.
Prefix: The namespace prefix part of the element-type name.
The empty constant '' if the name is unprefixed.
Attributes: An ordered NamedMap of attribute information items, one for each of the attributes (specified or defaulted from the DTD) of this element.
The map is ordered by the key  ename(NamespaceURI,LocalName) obtained from the NamespaceURI and LocalName of each attribute.
NameAttributes: An ordered NamedMap of attribute information items, one for each of the namespace declarations (specified or defaulted from the DTD) of this element. A declaration of the form xmlns="", which undeclares the default namespace, counts as a namespace declaration. By definition, all namespace attributes have a namespace URI of http://www.w3.org/2000/xmlns/
The map is ordered by the key ename('http://www.w3.org/2000/xmlns/',Prefix), where Prefix is the constant corresponding to the declared prefix by the namespace attribute. Prefix is the empty constant '' if the declaration is of the form xmlns="[SOME URI]".
Children: An ordered Node List of child information items, in document order. This list contains element, processing instruction, unexpanded entity reference, character, and comment information items, one for each element, processing instruction, reference to an unprocessed external entity, data character, and comment appearing immediately within the current element.
InScope: An ordered NamedMap of namespace URIs, one for each of the namespaces in effect for this element. The map is ordered by Prefix, and maps the Prefix to the namespace URI (a namespace information item). The map always contains and item with prefix xml which is implicitly bound to the namespace name http://www.w3.org/XML/1998/namespace.
Furthermore, and deviating from XML Info Sets, the map also contains a value for the empty prefix '', corresponding to the default namespace. If there is no default namespace declared the value for '' is also the empty constant.
BaseURI: The base URI term of the element, according to the IRI term representation.
Lang: The language tag in effect for the element, represented by a list of Unicode character codes. This might have been declared in the element or inherited from an ancestor element declaration.

The current implementation does not support the [parent] propery of Element Information Items.

Attribute Information Items
attribute(NamespaceURI, LocalName, Prefix, Value, Specified, Type)
NamespaceURI: A constant with the namespace name, if any, of the attribute.
The empty constant '' if
the attribute does not belong to a namespace.
LocalName: A constant representing the local part of the attribute name.
Prefix: The namespace prefix part of the attribute name.
The empty constant '' if the name is unprefixed.
Value: A list of Unicode character codes with the normalized attribute value.
Specified: The constant yes if this attribute was actually specified in the start-tag of its element; the constant no if it was defaulted from the DTD.
Type: The type declared for this attribute in the DTD. Currently, it is always set to the empty list.
The efficiency impact of supporting this property for documents without DTDs is being evaluated.

The current implementation does not support the [references] and [owner element] properties of Attribute Information Items.

Processing Instruction Information Items
pi(Target,Content,BaseURI)
Target: A constant representing the target part of the processing instruction.
Content: A list of Unicode character codes representing the content of the processing instruction, excluding the target and any white space immediately following it.
BaseURI: The base URI term of the PI, according to the IRI term representation.

The current implementation does not support the [notation] and [parent] properties of Processing Instruction Information Items.

Comment Information Items
comment(Content)
Content: A list of Unicode character codes representing the content of the comment.

The current implementation does not support the [parent] property of Comment Information Items.

Character Information Items
pcdata(Content) or whitespace(Content)
Content: A list of Unicode character codes representing text content. If content is white space appearing within element content, then the function symbol is whitespace; otherwise is pcdata.

The current implementation does not support the [parent] property of Character Information Items.

2.2 The Document Type Declaration

In this section we describe the Document Type Information Item and the representation of element specifications and attribute declarations. Notice that XML Info Sets does not specify items for the representation of information inside DTD, besides PIs.

Document Type Declaration Information Item
documenttype(QName, PublicId, SystemId, ElemDecl, AttDecl, Children )
QName: A term of the form qname(Prefix,LocalName) representing the document element qualified name, as it appears in the DOCTYPE declaration.
PublicId: A constant with the public identifier of the external subset, as it appears in the DOCTYPE declaration.
SystemId: An IRI ref term with the system identifier of the external subset, as it appears in the DOCTYPE declaration.The empty list if a system identifier is not provided.
ElemDecl: An ordered NamedMap of element content specification items, one for each element type declaration found in the internal subset of the Document Type Declaration.
The map is ordered by the key qname(Prefix,LocalName) obtained from the element's qualified name being declared.
AttDecl: An ordered NamedMap of attribute list declarations, one for each element with declared attributes found in the internal subset of the DTD.
The map is ordered by the key qname(Prefix,LocalName) obtained from the element's qualified name.
Children: An ordered NodeList of processing instruction information items appearing in the DTD, in document order.

The current implementation does not support the [parent] property of the Document Type Declaration Information Item.

Element Content Specification Items
spec(ContenSpec)
ContentSpec: A term representing the ContentSpecification of an element. This term has the following form:
  • the constant empty
  • the constant any
  • the constant '#pcdata'
  • the term times( seq(['#pcdata']) )
  • the term times( choice( ['#pcdata'|Names])), where Names is a non-empty list of qualified names of the form qname(Prefix,LocalName).
  • a term CP, times(CP), plus(CP), opt(CP), where CP is a CP Term of the form choice(CP) or seq(CP), as described below.

The CP Term represents a choice or a sequence in the content specification with the following form:

  • a qualified name term qname(Prefix,LocalName).
  • a term choice(ListOfCP) or seq(ListOfCp), where ListOfCP is a list of CP terms.
  • a term seq(ListOfCp),where ListOfCP is a list of CP terms.
  • a term of the form times(CP), plus(CP), opt(CP), where CP is a CP term.

The attribute list declaration items collects in the same structure all the attribute declarations found in the DTD for a given element. Notice that it is allowed to have an attribute declaration without an element type declaration. However, the converse means that no attributes may appear in an element.

Attribute List Declaration Items
attlist(AttDecl)
AttDecl: AttDecl is an ordered NamedMap of terms of the form attribute_decl(Type,Default). The map is ordered by the key qname(Prefix,LocalPart) obtained from the attribute qualified name in the DTD.

The Type argument is a term of the form:

  •  constant cdata, id, idref, idrefs, entity, entities, nmtoken, nmtokens
  • enum(ListOfNmtokens), where ListOfNmtokens is a list of constants correspoding to the name tokens (constants in UTF-8 encoding) found in the attribute declaration.
  • notations(ListOfNCNames), where ListOfNames is a list of NCNames (constants in UTF-8 encoding).

The Default argument is a term of the form:

  • the constants required, or implied
  • the term fixed(Value), where Value is a list of Unicode character codes representing the fixed attribute value.
  • the term default(Value), where Value is a list of Unicode character codes representing the default attribute value.

2.3 Entities and Notations

The XML Info Sets requires only to store the unparsed entities and notations declared in the DTD. The internal parsed entities and parameter entities are properly dealt with by the XML Parser, but it is not provided any representation accessible to the user.

There is a notation information item for each notation declared in the DTD.

Notation Information Items
notation(Name, PublicId, SystemId, BaseURI)
Name: A constant with the XML Name of the notation.
PublicId: A constant with the public identifier of notation.
The empty list if the public identifier is not provided.
SystemId: The system identifier URI, according to the IRI term representation.
The empty list if a system identifier is not provided.
BaseURI: The base URI relative to which the system identifier should be resolved, according to the IRI term representation.

Unparsed entity information items are stored in the document information item. There is one for unparsed entity information item for each unparsed general entity declared in the DTD.Unparsed entities are not expanded in attribute values since they are not read. The [notation] property of Unparsed Entity Information Items is not supported.

Unparsed Entity Information Items
unparsed_entity(Name, PublicId, SystemId, BaseURI,NotationName)
Name: A constant with the XML Name of the unparsed entity.
PublicId: A constant with the public identifier of the unparsed entity.
The empty list if the public identifier is not provided.
SystemId: The system identifier URI, according to the IRI term representation.
The empty list if a system identifier is not provided.
BaseURI: The base URI relative to which the system identifier should be resolved, according to the IRI term representation.
NotationName: The notation name associated with the entity.

A unexpanded entity reference information item serves as a placeholder by which the XML processor indicates that it has not expanded an external parsed entity. There is such an information item for each unexpanded reference to an external general entity within the content of an element. It is not supported the [parent] property of Unexpanded Entity Reference Information Items.

Unexpanded Entity ReferenceInformation Items
unexpanded_entity(Name, PublicId, SystemId, BaseURI)
Name: A constant with the XML Name of the external parsed entity.
PublicId: A constant with the public identifier of the external parsed entity.
The empty list if the public identifier is not provided.
SystemId: The system identifier URI, according to the IRI term representation.
The empty list if a system identifier is not provided.
BaseURI: The base URI relative to which the system identifier should be resolved, according to the IRI term representation.

 

 
3. Usage of the XML DOM Module

The XML DOM module is expected to be used in connection with W4 XML Parser. Currently, it is not the intent of the XML DOM module to define an API for dynamically constructing XML terms.  Therefore, only inspection predicates are described in this section, even though the current implementation exports "low-level" predicates for constructing XML DOM terms. These are for internal use of the XML Parser and should not be used in applications. A full blown API is being devised.

3.1 Working with the document and its content

The following predicates allow the users to perform most of the tasks required in applications. The typical application extracts the document (or root) element from the document term and starts processing. The first set of predicates implement operations to

Document Item Terms

  • isDocument( +Item )

    Succeeds if the argument is a document item term.
     
  • getDocumentChildren( +XMLDoc, ChildList )

    This predicate returns in the ChildList argument an ordered NodeList containing the child information item terms, in document order, of the XML Document term XMLDoc. There is only one element information item term in this list (the Document Element). This list also contains all processing instruction items, comment items, and Document Type Declaration item occuring in the Prolog and Epilog of the XML document.
     
  • getDocumentElement( +XMLDoc, Element )

    This predicate returns in argument Element the element information item term corresponding to the document element of the given XMLDoc document item term.
     
  • getDocumentNotations( +XMLDoc, NotationMap )

    This predicate returns in argument NotationMap an ordered Named Map contatining the declared notation information items of the given XMLDoc document item term. This map is ordered by the name of the notation, a NCName, which is a constant in UTF-8.
     
  • getDocumentUnparsedEntities( +XMLDoc, UnparsedMap )

    This predicate returns in argument UnparsedMap an ordered Named Map contatining the declared unparsed external entity information items of the given XMLDoc document item term. This map is ordered by the name of the unparsed entity, a NCName, which is a constant in UTF-8.
     
  • getDocumentBaseURI( +XMLDoc, BaseURI )

    Obtains the BaseURI of the the given XMLDoc document item term.The value of argument BaseURI is an IRI reference term.
     
  • getDocumentEncoding( +XMLDoc, Encoding )

    Obtains the Encoding of the the given XMLDoc document item term. Argument BaseURI is constant with the name of the character encoding scheme in which the document entity is expressed
  • getDocumentStandalone( +XMLDoc, Standalone )

    This predicate returns in argument Standalone the indication of the standalone status of the given the given XMLDoc document item term. Argument Standalone is the constant yes or no, or the empty list if such information was not provided in the XML declaration.
  • getDocumentVersion( +XMLDoc, Version )

    This predicate returns in argument Version the constant with the XML version of the given XMLDoc document item term. The empty list is returned [] if there is no XML declaration. Currently, we only support only XML 1.0.
     
  • getDocumentAllProcessed( +XMLDoc, AllProcessed )

    This predicate returns in argument AllProcessed the constant yes or no indicating whether the processor has read the complete DTD of the given the given XMLDoc document item term. The empty list is returned if there is no DTD.

  • getDocumentDTD( +XMLDoc, DTD )

    This predicate returns in argument DTD the Document Type Declaration item term of the given the given XMLDoc document item term. The empty list is returned if there is no DTD.

Element Item Terms

  • isElement( +Item )

    Succeeds if the argument is a XML element item term.
     
  • getElementName( +EltItem, NamespaceURI, Local, Prefix )

    Given an element item tem in the argument EltItem, this predicate returns the NamespaceURI, the Local part, and the Prefix of the element's qualified name. The NamespaceURI and Local identify the element.  The NamespaceURI should be an absolute URI, while Local and Prefix are NCNames. The last three arguments are constants in UTF-8 encoding. Both Prefix and NamespaceURI are the empty constant '' whenever the element does not belong to a namespace.
     
  • getElementChildren( +EltItem, ChildList )

    This predicate returns in the ChildList argument an ordered NodeList containing the child information item terms, in document order, of the XML element item term EltItem. Notice that no two character data items may appear consecutively, and that references to unprocessed external entity also appear in the node list.
     
  • getElementAttributes( +EltItem, Attributes, NSAttributes )

    Predicate  getElementAtttibutes/3 returns the two ordered Named Maps Attributes and NSAttributes of the of the XML element item term EltItem. Map Attributes contains all attribute information items, one for each of the attributes (specified or defaulted from the DTD) of this element. The map is ordered by the key ename(NamespaceURI,LocalName) obtained from the NamespaceURI and LocalName of each attribute. Map NSAttributes contains attribute information items, one for each of the namespace declarations (specified or defaulted from the DTD) of this element. The map is ordered by the key ename('http://www.w3.org/2000/xmlns/',Prefix), where Prefix is the constant corresponding to the declared prefix by the namespace attribute. Prefix is the empty constant '' if the declaration is of the form xmlns="[SOME URI]".
     
  • getElementInScopeNamespaces( +EltItem, Namespaces )

    This predicate returns in the Namespaces argument an ordered Named Map of namespace URIs, one for each of the namespaces in effect for the element item term EltItem. The map is ordered by Prefix, and maps the Prefix to the namespace URI (a namespace information item). The map always contains and item with prefix xml which is implicitly bound to the namespace name http://www.w3.org/XML/1998/namespace. The map also contains a value for the empty prefix '', corresponding to the default namespace. If there is no default namespace declared, the value for '' is also the empty constant.
     
  • getElementBaseURI( +EltItem, BaseURI )

    This predicate returns in the BaseURI argument an IRI reference term with the Base URI of the element item in the given EltItem argument.
     
  • getElementLang( +EltItem, Lang )

    This predicate returns in the Lang argument a list of Unicode character codes with the language tag for the element item in the given EltItem argument.

Attribute Item Terms

  • isAttribute( +Item )

    Succeeds if the argument is a XML attribute item term.
     
  • getAttributeName( +AttItem, NamespaceURI, Local, Prefix )

    Given an attribute item tem in the argument AttItem, this predicate returns the NamespaceURI, the Local part, and the Prefix of the qualified name. The NamespaceURI and Local identify univocally this attribute.  The NamespaceURI should be an absolute URI, while Local and Prefix are NCNames. The last three arguments are constants in UTF-8 encoding. Both Prefix and NamespaceURI are the empty constant '' whenever the attribute does not belong to a namespace.
     
  • getAttributeValue( +AttItem, Value )

    This predicate returns in the Value argument the list of Unicode character codes corresponding to the normalized value of the attribute item given in argument AttItem.
     
  • getAttributeSpecified( +AttItem, Specified )

    This predicate returns the Specified flag of the attribute item in argument AttItem. The Specified argument can take the values  yes if this attribute was actually specified in the start-tag of its element; or no if it was defaulted from the DTD.
     
  • getAttributeType( +AttItem, Type )

    Obtains the Type of the attribute item term AttItem, as declared in the DTD, or the empty list if the attribute was not declared.
    Currently, it always returns the empty list.

Processing Instruction Item Terms

  • isPI( +Item )

    Succeeds if the argument is a XML processing instruction item term.
     
  • getPITarget( +PIItem, Target )

    This predicate returns in the Target argument a constant with the target of the processing instruction item provided in the PIItem argument.The target constant is a NCName in UTF-8 encoding.
     
  • getPIContent( +PIItem, Content )

    This predicate returns in the Content argument a list of Unicode character codes of the processing instruction content in the given PIItem term, excluding the mandatory whitespace after the target and th final ?> delimiter.
     
  • getPIBaseURI( +PIItem, BaseURI )

    This predicate returns in the BaseURI argument an IRI reference term with the Base URI of the processing instruction item in the given PIItem term.

Comment Item Terms

  • isComment( +Item )

    Succeeds if the argument is a XML comment item term.
     
  • getCommentContent( +CommItem, Content )

    This predicate returns in the Content argument a list of Unicode character codes of the comment content in the given CommItem term.This does not include the starting <!-- and finishing --> comment delimiters.

Character Information Items

  • isCharData( +Item )

    Succeeds if the argument is a XML Character Data Information item, including whitespace.
     
  • isWhiteSpace( +Item )

    Succeeds if the argument is whitespace.
     
  • getCharData( +CharItem, Content )

    This predicate returns in argument Content a list of Unicode character codes of the text content in the CharItem term.

3.2 Working with the Document Type Declaration

The XML DOM module provides basic support of DTDs. The user is able to obtain all the properties of Document Type Declaration information items, plus the specification of attributes and elements. Currently, it is not kept any information about internal entitities and therefore one cannot "regenerate" the original document.

Document Type Declaration Item Terms

  • isDocumentType( +Item )

    Succeeds if the argument is a Document Type Declaration item term.
     
  • getDocumentTypeChildren( +DTDItem, ChildList )

    This predicate returns in the ChildList argument an ordered NodeList of processing instruction information items appearing in the DTD, in document order. For future compatibility, the users should guarantee that they are processing PIs. This property might be extended to contain all the markup declaration the DTD.
     
  • getDocumentTypeQualifiedName( +DTDItem, QName )

    This predicate returns in argument QName a term of the form qname(Prefix,LocalName) representing the document element qualified name, as it appears in the Document Type Declaration item term DTDItem.
     
  • getDocumentTypePublicId( +DTDItem, PublicId )

    This predicate returns in argument PublicId a constant with the public identifier of the external subset, as it appears in the Document Type Declaration item term DTDItem.
     
  • getDocumentTypeSystemId( +DTDItem, SystemId )

    This predicate returns in argument SystemId an IRI ref term with the system identifier of the external subset, as it appears in the Document Type Declaration item term DTDItem.
     
  • getDocumentTypeElementDeclarations( +DTDItem, ElemDecl )

    This predicate returns in argument ElemDecl ordered NamedMap of element content specification items, one for each element type declaration found in the the Document Type Declaration item term DTDItem. The map is ordered by the key qname(Prefix,LocalName) obtained from the element's qualified name being declared.
     
  • getDocumentTypeAttributeDeclarations( +DTDItem, AttDecl )

    This predicate returns in argument AttDecl ordered NamedMap of attribute list declarations, one for each element with declared attributes found in the Document Type Declaration item term DTDItem. The map is ordered by the key qname(Prefix,LocalName) obtained from the element's qualified name.

Additionally, the following predicates can be used to obtain directly the element and attribute declarations from the DTD.

  • getElementSpecificationFromDTD( +DTDItem, +ElemQName, ElemSpec )

    This predicate returns in argument ElemSpec an element specificaton term describing the content of element with the qualified name ElemQName, as it appears in the Document Type Declaration item term DTDItem. The element qualified name must be a term of the form qname(Prefix,Local).
     
  • getAttributeDeclarationFromDTD( +DTDItem, +ElemQName, +AttQName, Type, Default )

    This predicate returns in arguments Type and Default the declaration of attribute AttQName in element ElemQName, as it appears in the Document Type Declaration item term DTDItem. The element and attribute's qualified names must be terms of the form qname(Prefix,Local).
     
  • getDefaultAttributesFromDTD( +DTDItem, +QName, Attributes, NSAttributes )

    This predicate returns the ordered NamedMap with the default attributes and Namespace attributes obtained from the Document Type Declaration item term DTDItem, for element QName. The element qualified name must be a term of the form qname(Prefix,Local).
     

Element Specification Terms

  • isElementSpecification( +Item )

    Succeeds if the argument is an Element Specification item.
     
  • getElementSpecification( +ElemSpec, ContentSpec )

    This predicate returns in the ContentSpec the content specification term of the element specification item ElemSpec. The structure of content specification terms is described in Section 2.2 above.
     

Attribute List Declaration Terms

  • isAttributeListDeclaration( +Item )

    Succeeds if the argument is an Attribute List Declaration item.
     
  • getAttributeListDeclaration( +AttList, AttMap )

    Obtains the ordered NamedMap AttMap of terms of the form attribute_decl(Type,Default). The map is ordered by the key qname(Prefix,LocalPart) obtained from the attribute qualified name in the DTD.
     
  • getAttributeDeclaration( +AttDecl, Type, Default )

    This predicate returns in arguments Type and Default the declaration extracted from a term of the form attribute_decl(Type,Default) in AttDecl.  The structure of type and default value terms are described in Section 2.2 above.
     
  • getAttributeDeclaration( +AttList, +AttQName, Type, Default )

    This predicate returns in arguments Type and Default the declaration of attribute AttQName in the attribute list declaration item provided in argument AttList. The element and attribute's qualified names must be terms of the form qname(Prefix,Local). The structure of type and default value terms are described in Section 2.2 above.
     

3.3 Working with Entities and Notation

The document item iterm keeps ordered Named Maps containing the notation and unparsed entities declared in the internal part of the DTD. Furthermore, external parsed entity references are substituted by

  • isExternalEntity( +Item )

    Succeeds if the argument is a (unexpanded) external parsed entity reference.
     
  • isUnparsedEntity( +Item )

    Succeeds if the argument is a unparsed entity declaration item.
     
  • getEntityName( +EntItem, NCName )

    This predicate returns in argument NCName the name of the (unparsed or unxpanded) entity item given in argument EntItem.
     
  • getEntityPublicId( +EntItem, PublicId )

    This predicate returns in argument PublicId a constant with the public identifier of the (unparsed or unxpanded) entity item given in argument EntItem.
     
  • getEntitySystemId( +EntItem, SystemId )

    This predicate returns in argument SystemId an IRI ref term with the public identifier of the (unparsed or unxpanded) entity item given in argument EntItem.
     
  • getEntityBaseURI( +EntItem, BaseURI )

    This predicate returns in argument BaseURI an IRI ref term with the base URI relative to which the system identifier of EntItem term should be resolved.
     
  • getEntityNotationName( +EntItem, NotName )

    This predicate returns in argument NotName the NCName notation name associated with the entity term EntItem.


Notation Item
Terms

  • isNotation( +Item )

    Succeeds if the argument is a Notation item term.
     
  • getNotationName( +NotItem, NCName )

    This predicate returns in argument NCName the name of the notation item given in argument NotItem.
     
  • getNotationPublicId( +NotItem, PublicId )

    This predicate returns in argument PublicId a constant with the public identifier of the notation item term NotItem.
     
  • getNotationSystemId( +NotItem, SystemId )

    This predicate returns in argument SystemId an IRI ref term with the public identifier of the notation item term NotItem.
     
  • getNotationBaseURI( +NotItem, BaseURI )

    This predicate returns in argument BaseURI an IRI ref term with the base URI relative to which the system identifier of NotItem term should be resolved.
     

3.4 Working with Node lists and Named Maps

The XML DOM representation uses node lists and named maps to represent several properties of the XML Info Sets. Node Lists are used to repre. Even though, both Node Lists and Named Maps are ordinary lists, we suggest to use the following predicates to traverse them in order to guarantee compatibility with future version of the XML DOM Module.

Node Lists

Node lists are used in the XML DOM module to represent Children of the Document Item, Element Item, and Document Type Definition items. The iterations are programmed using the predicates getHeadNodeList/2, getTailNodeList/2 and isEmptyNodeList/1.

  • isNodeList( +Item )

    Succeeds if the argument is a Node List term.
     
  • isEmptyNodeList( +NodeList )

    Succeeds if the argument is an empty Node List term.
     
  • getHeadNodeList( +NodeList, Item )

    This predicate returns in argument Item the first element of  NodeList. Fails if NodeList is empty.  
     
  • getTailNodeList( +NodeList, Tail )

    This predicate returns in argument Tail the node list obtained by removing the first element in NodeList. Fails if NodeList is empty. 

Ordered Named Maps

Ordered Named Mape are used in the XML DOM module to represent ordered list of unparsed entities and notation items in the Documen item; Attributes, Namespace attributes and in scope namespace items in element items; and element and attribute list declarations in document type definition items. The iterations are programmed using the predicates getFirstNamedMap/3, getRestNamedMap/2 and isEmptyNamedMap/1. The Named Map is ordered by a complex term key, using the usual Prolog @< term ordering. Additionally, it is provided a predicate for searching the named map.

  • isNamedMap( +Item )

    Succeeds if the argument is an ordered Named Map.
     
  • isEmptyNamedMap( +NamedMap )

    Succeeds if the argument is an empty Named Map.
     
  • getFirstNamedMap( +NamedMap, Item )

    This predicate returns in argument Item the first element of  NamedMap. Fails if NamedMap is empty.  
     
  • getFirstNamedMap( +NamedMap, Key, Item )

    This predicate returns in argument Item the first element of  NamedMap, and the corresponding key in the second argument. Fails if NamedMap is empty.  
     
  • getRestNamedMap( +NamedMap, Tail )

    This predicate returns in argument Tail the ordered map obtained by removing the first element in NamedMap. Fails if  NamedMap is empty. 
  • getNamedItem( +NamedMap, + Key, Item )

    This predicate returns in argument Item the element of  NamedMap with the key provided in the second argument. Fails if NamedMap does not contain an element with this key.
 

4. Sample Code

 The processing of XML documents in Prolog is rather straightforward, but sometimes boresome. In this section, it is presented sample code for writing a XML document term to a stream. The user can easily adapt the code for iterating over the several item term types. This sample code is a subset of our module XML Write, which might be used as a general template for XML Document term processing. The code below must be complemented with the definition of predicates writeString/2, writeEscapedString/2 and writeEscapedAttributedValue/2.
 

writeSimpleXML( Stream, Item ) :-
	isDocument( Item ), !,
	writeXMLDocumentItem( Stream, Item ).
	
% Only Writes the document item children. 
%The XML Declaration and the DTD are ignored.
writeXMLDocumentItem( Stream, Doc ) :-
	getDocumentChildren( Doc, Children ),
	writeXMLNodeList( Stream, Children ).
% Iterates over the items in the NodeList
writeXMLNodeList( Stream, NodeList ) :- 
	getHeadNodeList( NodeList, Item ), !,
	( isElement( Item )       -> writeXMLElement( Stream, Item )
	; isCharData( Item )      -> writeXMLCharData( Stream, Item )
	; isComment( Item )       -> writeXMLComment( Stream, Item )
	; isPI( Item )            -> writeXMLPI( Stream, Item )
        ; isExternalEntity( Item )-> writeXMLEntityReference( Stream, Item )
	; otherwise               -> true
	),
	getTailNodeList( NodeList, RestNodeList ), !,
	writeXMLNodeList( Stream, RestNodeList ).
writeXMLNodeList( _, NodeList ) :- 
        isEmptyNodeList( NodeList ).

% Writes an element
writeXMLElement( Stream, EltItem) :- 
	getElementName( EltItem, _, Local, Prefix ),
	write( Stream, '<' ),
	writeQName( Stream, Prefix, Local ),

	getElementAttributes( EltItem, Attributes, NSAttributes ),
	writeXMLAttributes( Stream, NSAttributes ),
	writeXMLAttributes( Stream, Attributes ),
	write( Stream, '>' ),

	getElementChildren( EltItem, Children ),
	writeXMLNodeList( Stream, Children ),

	write( Stream, '</' ),
	writeQName( Stream, Prefix, Local ),
	write( Stream, '>' ).

% Iterates over the atrtibutes and writes them
writeXMLAttributes( Stream, NamedMap ) :- 
	getFirstNamedMap( NamedMap, _, Att ), !,
	writeXMLAttribute( Stream, Att ),
	getRestNamedMap( NamedMap, RestNamedMap ), !,
	writeXMLAttributes( Stream, RestNamedMap ).
writeXMLAttributes( _, NamedMap ) :- 
        isEmptyNamedMap( NamedMap ).

writeXMLAttribute( Stream, Att ) :- 
	getAttributeName( Att, _, Local, Prefix ),
	write( Stream, ' ' ),
	writeQName( Stream, Prefix, Local ),

	getAttributeValue( Att, NormValue ),
	write( Stream, '="' ),
	writeEscapedAttributeValue( Stream, NormValue ),
	write( Stream, '"' ).

writeXMLCharData( Stream, CharData ) :- 
	getCharData( CharData, Content ),
	writeEscapedString( Stream, Content ).

writeXMLComment( Stream, Comment ) :-
	getCommentContent( Comment, Text ),
	write( Stream, '<!--' ), 
	writeString( Stream, Text ),
	write( Stream, '-->' ).

writeXMLPI( Stream, PI ) :- 
	getPITarget( PI, Target ),
	getPIContent( PI, Content ),
	write( Stream, '<?' ),
	writeNCName( Stream, Target ),
	( Content \= [] -> write( Stream, ' ' ), 
                           writeString( Stream, Content ) 
        ;                  true 
        ),
	write( Stream, '?>' ).

writeXMLEntityReference( Stream, EntityRef ) :-
	getEntityName( EntityRef, EntName ),
	write( Stream, '&' ),
	writeNCName( Stream, EntName ),
	write( Stream, ';' ).
% NCNames are already in UTF-8 encoding
writeNCName( Stream, Name ) :-
	write( Stream, Name ).

% The prefix and local parts of a qualified name are already in UTF-8 encoding.
% Notice how the mepty prefix is tested.
writeQName( Stream, Prefix, Local ) :- 
	( Prefix = '' -> 
		write( Stream, Local ) 
	; 	write( Stream, Prefix ), 
		write( Stream, ':' ), 
		write( Stream, Local )
	).

5. Limitations

  • The type of attributes declared in the DTD is not introduced in the XML DOM representation.
  • Internal and parameter entities are not kept in the representation.
  • The properties which involve referencing other information items are not implemented ( [parent], [references], [notation] and [owner element]).
    This is motivated by the fact that some Prolog systems do not give support to cyclic terms.

5. Copyright

(c) Carlos Viegas Damásio (cd@di.fct.unl.pt)
CENTRIA - Centro de Inteligência Artificial da Universidade Nova de Lisboa

This software is distributed under the GNU Library General Public License.
 
Last update: November 9th, 2003