Relationships in XML1
John I. Bobbitt
POSC
License Agreement: © 2004, Petrotechnical Open Standards Consortium, Inc. All rights reserved. All access, receipt, and/or use of this document is subject to the POSC Product Licensing Agreement posted on the POSC Web site at http://www.posc.org/about/license.shtml.
Abstract: A data model allows entities2 to have relationships to other entities. Three methods will be given for implementing these relationships in XML, along with discussions of the advantages and disadvantages of each. In addition, the concept of a remote reference will allow the use of hybrid methods.
A data model contains entities with relationships to each other. In XML, the concept of an entity becomes the module - an independent, complex type, that can be inserted into an application schema as needed. The data model defines relationships between these entities, which must be reflected in the way the modules are related.
Three methods will be described in later sections. They will be referred to as the “id method,” the “natural key method,” and the “child method.” With each of these methods, it is necessary to show how the relationship can be implemented by the writer of the XML, and interpreted by the reader.
The powerful concept of the remote reference will also be discussed. This method may allow application schema developers to modify the referencing method incorporated by a module to another of the methods. It also allows dictionary of instances to exist, which means that the referenced module may be a reference only, with the details of the module being contained in another place (such as an XML dictionary, or a database with XML services available.)
The example that will be used in this paper is a well3 with a relationship to one or more well associates. The well is associated with an XML module that will be used. The well associates will be associate with one or more “business associate” modules which allow the relationship to contain roles (such as operator, royalty owner, service company, driller).

..
An example of the XML that might come from this is shown below:
<Well>
. . .
<WellAssociates>
<Operator href=”someBAInstance”/>
<RoyaltyOwner>
<Identifier namingSystem=”MyCompany”>Johann Deaux</Identifier>
</RoyaltyOwner>
</WellAssociates>
. . .
</Well>
This is not meant to be the proper way to handle the relationship between modules, but is an illustration of the nature of a relationship, and two ways of handling the problem.
The methods will be given in following sections.
This method is intended to make use of the DTD concepts of ID and IDREF. It also makes use of the XSL id(.) function.
The referencing element would appear as:
<Operator href=”someBAInstance”/>
and the referenced module element would be:
<BusinessAssociate id=”someBAInstance”>4 . . . some additional elements probably . . . </BusinessAssociate>
It is important to note that the href value 'someBAInstance' has a corresponding element with that same value. The value does not need to have any semantic meaning, and does not need to be the same from one XML instance document to the next. It is strictly a value that allows an internal referencing from one element to the other.
If the id attribute is declared in DTD to be of type ID and the href is of type IDREF.
ATTLIST Operator href IDREF #REQUIRED ATTLIST BusinessAssociate id ID #REQUIRED
then the XSL function, id(.) can be used to “move” directly to the referenced instance.
The same may also be accomplished using the XML schema constructs of key and keyref, along with the key(.) function.
The important points about the id attribute method are that it carries no semantics, is used strictly for internal referencing, and may vary from document to document. There is no intent that its value should be exposed to the reader other than for the referencing concept.
It is possible to implement this method using the schema datatypes of ID and IDREF. However, these are being replaced by the schema concepts of key, and keyref. The following will illustrate the implementation of these using these two schema concepts. The key element is comparable to the ID, and the keyref is comparable to the IDREF.
These must be set within the context of another element. Since we will allow the id to appear anywhere in the document, we will need to set this up in the application schema under the “root” element, which we will call MainElement.
By making something of type key, we are doing the following: (1) it must be unique within its defining context, and (2) it defines the value as being a (named) key, that can be referenced by another element. When a keyref is declared to reference this key, there must be a value within the context that matches it. Note that this compares to the behaviour of ID and IDREF, where the context for these are the whole document.
<element name=”MainElement”>
<complexType>
. . .some elements. . .
<!-- Well is a subelement of RootElement, and will contain Operator
as a subelement -->
<element name=”Well” maxOccurs=”unbounded”>
<complexType>
. . . some elements. . .
<element name=”Operator”> <!-- no content. attribute only-->
<complexType>
<attribute name=”href” type=”string”/>
</complexType>
</element>
. . . some more elements . . .
</complexType>
. . . some more elements. . .
<!-- The following keyref. Within the context of the Operator, href will be
used as a key reference to the key defined (later on) by the name, baid -->
<keyref name=”dummy1” refer=”p:baid”>
<selector xpath=”p:Operator”/>
<field xpath=”@href”/>
</keyref>
</element>
. . . some more elements. . .
<!-- Now we will allow one or more BusinessAssociates to be included.
I will not give the details of baType here. -->
<element name=”BusinessAssociate” type=”p:baType” maxOccurs=”unbounded”/>
</complexType>
<!-- Within the context of the MainElement, we will define any BusinessAssociate
instance to be an appropriate context (see selector). Then we will say that
the id attribute (part of the baType, not shown here) will be a key -->
<key name=”baid”>
<selector xpath=“p:BusinessAssociate”/>
<field xpath=”@id”/>
</key>
</element>
Here is how an XML instance document might look (3 wells, two operators).
<MainElement>
. . .
<Well>
. . .
<Operator href=”ba1”/>
. . .
</Well>
<Well>
. . .
<Operator href=”ba2”/>
. . .
</Well>
<Well>
. . .
<Operator href=”ba1”/>
. . .
</Well>
. . .
<BusinessAssociate id=”ba1”>
. . . stuff about the business associate
</BusinessAssociate>
<BusinessAssociate id=”ba2”>
. . . stuff about this business associate
</BusinessAssociate>
. . .
</MainElement>
The declaration of BusinessAssociate/@id as a key means it must be unique, may not be null, and can be referenced from elsewhere. The declaration of Operator/@href as a keyref means that there must be a BusinessAssociate/@id value that matches the Operator/@href value.
When I have the XML (as in the above example), how do I “go” from the <Operator href=”ba1”/> to the <BusinessAssociate id=”ba1”>. This depends on the application that is used to parse it. There are functions that perform this in Perl, Java, and, I am certain, in other languages. The following shows how this is done in XSLT.
<xsl:stylesheet
xmlns:p=”http://xxxxx”
xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”>
<xsl:key=”bakey” match=”p:BusinessAssociate” use=”@id”/>
<xsl:template match=”/”>
. . . do some xsl processing
now process the well, and go to the BusinessAssociate.
<xsl:apply-templates select=”//p:Well”/>
. . . more processing. Then close main template
</xsl:template>
<xsl:template match=”p:Well”>
. . we are in the Well. We will find the operator.
<xsl:apply-templates select=”key('bakey',p:Operator/@href)”/>
. . . and continue with other processing
</xsl:template>
<xsl:template match=”p:BusinessAssociate”>
. . .the key function has brought us to the BusinessAssociate instance.
. . . now we can do whatever we need at this spot.
</xsl:template>
Users of the key(.) function should read the [XSL] document for more details of how to use it.
A second way to reference a module is to use a natural key. Here is an example of how the above would look:
<Operator>
<Identifier>
<Name>BP America</Name>
</Identifier>
</Operator>
and the referenced module would be
<BusinessAssociate>
<Identifier>
<Name>BP America</Name>
</Identifier>
. . . addresses, contact names, phone number, etc . . .
</BusinessAssociate>
The referencing mechanism uses a natural key ('BP America' in this example) to reference the appropriate module instance. This case differs from the previous case in that there is generally semantic meaning to the value.
There are other differences. The ID and IDREF construct requires the ID attribute to be unique, and the IDREF to actually refer to something in the file, this is not the case with the natural key method. If an application developer wishes it, he may include the referenced object, or he may not. The natural key still contains some information about the BusinessAssociate, but it is not necessary to include the instance within the document.
There is nothing new about the application of this method. However, the following sections will discuss some of the possibilities.
This would be the case above in which the Operator element was present, but the BusinessAssociate was not.
The semantics of the file is that the Operator is 'BP America.' There is no further information about the Operator other than that name. For the reading program to make use of this information, it must obtain further data about 'BP America' from its own knowledge. It may be hard-wired knowledge. It may be that the reading program knows how to access an instance from another XML file, or from a database. It may be that the reader only needs the name.
In any event, the 'Identifier' is available, and there are many things that can be done with this information only.
The situation can be comparable to Section 3. If there is an instance, an XSLT (for example) can be given to select the instance using the key, keyref method. The difference between this method and that shown in section 3 is that the key and its reference is a string value with path Operator/Identifier/Name, instead of Operator/@href.
Note that the XSLT file can be developed to handle the case in which the instance is not present. Then the key function would not find an instance, and processing would continue from there.
The reference may be a remote reference to a dictionary file, or to a business associate service. Again, the key(.) function, together with the reference given, allows the reader to go to another document and continue the processing from there.
An Object can be associate with another Object by instantiating it as a child of the parent. Here is how the above would look:
<Operator>
<Identifier>
<Name>BP America</Name>
</Identifier>
. . . addresses, contact names, phone number, etc . . .
</Operator>
Instead of referencing a business associate that exists elsewhere, it is actually instantiated inline.
In data modelling terminology, we have de-normalized the model. The results of this denormalization are (1) that we cannot reuse the definition of 'BP America,' and (2) if we need to use it again (for example, with another well), we must redo the whole definition.
This, of course, offers no difficulties to the reader. The information is contained in the instances, and the reader does not need to resolve a reference to find it.
The above are the three pure methods of instantiating a relationship between objects. The concept of a remote reference5 allows the instance to be “skipped” to another location – often another document, or a service.
Here are some examples of how that can work.
Using the “id method” of section 3.1, we have reference to 'ba1' and 'ba2'. Here are some ways to skip to another document.
<RemoteReference id=”ba1” href=”http:somewhere”/> which will allow the document(.) function in XSL to send you to somewhere. <BusinessAssociateDictionary href=”http:somewhereElse”/> which has the semantics of saying, “Go to the document defined by the href, and then go to the instance with id, such as ba1, or ba2.
When using the “natural key method,” the remote reference works the same. The first example is altered:
<RemoteReference href=”http:somewhere”>BA America</RemoteReference>
while the second method above is the same (except that you find the instance with Identifier/Name = 'BA America').
Consider the example of a Well with many Operators. In a relational model, the well cannot point to the many operators and declare them to belong to the Well. The only way to do this is to have the many instances of the Operator reference the same Well. The way to find out that a Well has many Operators is to ask for all Operators which reference a given Well:
Example SQL Select Identifier from Operator where Well_UWI = 'some well identifier';
XML has an advantage over the relational model, because the Well can reference zero or more Operators. Any of the above referencing methods may do this.
Note that the referencing can also go the other way. For example, the BusinessAssociate instance can point to the well as:
<BusinessAssociate>
<Identifier>
<Name>BP America</Name>
</Identifier>
<OperatorFor href=”well1”/>
. . .
</BusinessAssociate>
In this example, it is easy to see that, although this is possible, it is probably not a good idea. However, there may be times when it is an appropriate referencing method, and the technique should be kept in mind.
[ANSIX12] X12 Reference Model for XML Design, 2002-10, produced by the ANSI X12 committee, obtainable at http://www.x12.org/x12org/.
[BestPractices] Best Practices Homepage, developed and maintained by XML-dev and Mitre, obtainable at http://www.xfront.com/BestPracticesHomepage.html.
[ComProServ] PIDX XML Standards Master, Version 1.0, RP 3901, produced by PIDX, obtainable at http://committees.api.org/business/pidx/standards.htm.
[EBCCNAM] ebXML RT - Naming Convention for Core Component, 2001-05-10, produced by the ebXML group, obtainable at http://www.ebxml.org/specs/index.htm#technical_reports.
[ebTechArch] ebXML Technical Architecture Specification V1.0.4, 2001-02-16, produced by the ebXML group, obtainable at http://www.ebxml.org/specs/ebTA.pdf.
[FedDevGuide] Draft Federal XML Developer's Guide, 2002-04 (work in progress), produced by the Federal CIO Council, obtainable at http://xml.gov/documents/in_progress/developersguide.pdf.
[FedTagStds] Federal Tag Standards for Extensible Markup Language, 2001-06, produced by LMI, not obtainable from the internet.
[HKGuide] XML Schema Design and Management Guide, (4 parts), Draft versions dated in summer, 2003. Produced by Hong Kong Information Services Technology Division. Available at http://www.itsd.gov.hk/itsd/english/infra/eif.htm.
[IETFKeywords] Key Words for Use in RFCs to Indicate Requirement Level, 1997-03, obtainable at http://www.ietf.org/rfc/rfc2119.txt.
[ISO8601] International Standard Date and Time, 2001-11-10, produced by ISO. A web page that explains the formats is http://www.cl.cam.ac.uk/~mgk25/iso-time.html.
[ISO11179] ISO 11179 Part 5 - Naming and Identification, 1995-12, produced by ISO, obtainable at http://fdr.faa.gov/iso/ISO11179page.htm. There is a later version, that is available from the ISO website,
[UKGuide] e-Government Schema Guidelines for XML, 2002-12, produced by United Kingdom e-Envoy, obtainable at http://www.e-envoy.gov.uk/Resources/Guidelines/fs/en.
[Unicode] Unicode Charts, available at http://www.unicode.org/charts/.
[W3CSchemaDatatypes] W3C Schema Datatypes, 2001-05-02, produced by W3C, obtainable at http://www.w3.org/TR/xmlschema-2.
[W3CNamespaces] Namespaces in XML, 1999-01-14, produced by W3C, obtainable at http://www.w3.org/TR/REC-xml-names/.
[W3CSchemaPrimer] W3C Schema Primer, 2001-05-02, produced by W3C, obtainable at http://www.w3.org/TR/xmlschema-0.
[W3CSchemaStructures] W3C Schema Structures, 2001-05-02, produced by W3C, obtainable at http://www.w3.org/TR/xmlschema-1.
[Xlink] W3C XLink Specification, 2001-06, produced by W3C, obtainable at http://www.w3.org/TR/xlink/.
[Xpath] W3C XPath Specification, 1999, produced by W3C, obtainable at http://www.w3.org/TR/xpath/.
[XSL] W3C XSL and XSLT Specifications, produced by W3C, obtainable at http://www.w3.org/Style/XSL/.
POSC references are available in the following formats:
[html] html format readable by browsers
[doc] MS Word 97/2000/XP
[sxw] OpenOffice writer, v1.0
[IntroModule] Introduction to Modules, Copyright 2002-2003. Available in [html], [doc], [sxw].
[BuildModule] Build a Module - a tutorial. Copyright 2003. Available in [html], [doc], [sxw].
[ImportModule] Importing Modules within your Modules. Copyright 2003. Available in [html], [doc], [sxw].
[Guidelines] Guidelines for XML Schemas, Version 2003. Copyright 2003. Available in [html], [doc], [sxw].
[ModulePolicies] Policies on Modules. Copyright 2002-2003. Available in [html], [doc], [sxw].
[ProfilesAppSchema] Modules, Profiles, and Application Schemas. Copyright 2002-2003. Available in [html], [doc], [sxw].
[XMLTables] XML Tables. Copyright 2003. Available in [html], [doc], [sxw].
[ReferenceData] Reference Data and Enumerated Lists Implemented in XML. Copyright 2002-2003. Available in [html], [doc], [sxw].
[Dictionaries] Examples of XML Dictionary Usage. Copyright 2003. Available in [html], [doc], [sxw]. Accompanied by sample code.
[Relationships] Relationships in XML. Copyright 2003. Available in [html], [doc], [sxw].
[UOMRecs] Unit of Measure Recommendations. Copyright 2002-2003. Available in [html], [doc], [sxw].
1© 2004, Petrotechnical Open Standards Consortium, Inc. All rights reserved. All access, receipt, and/or use of this document is subject to the POSC Product Licensing Agreement posted on the POSC Web site at http://www.posc.org/about/license.shtml.
2Various terms are used for entities- tables, objects, entities. The usage generally varies because of the modelling technique used (relational model, or object model). The term, entity, will be used in this paper to represent all of these concepts. It should also be noted that different terms may also be used for the relationship (association, containment). The term relationship will be used to cover all of these concepts.
3The well is intended to include a wellbore. In this example, no distiction is made between these.
4The attributes, href and id, are not required to be those names. However, they will be used consistently throughout this paper.
5A remote reference may also be called a skip reference.
2003-08-08 Relationships in XML Page