XML Tables1

John I. Bobbitt

POSC

bobbitt@posc.org



License Agreement: © 2004, Petrotechnical Open Standards Consortium, Inc. All rights reserved. All access, receipt, and/or use of this document is subject to the POSC Product Licensing Agreement posted on the POSC Web site at http://www.posc.org/about/license.shtml.

Abstract: A table is an organized collection of data. A table differs from a module in that a table does not have an identity. This paper discusses three methods of developing schemas for tables.

1.Introduction

An XML module represents a business object, and has an identity [IntroModule]. An XML table is a tight collection of information that does not have a natural identity.

Examples of tables would be the monthly production for a well, the UKOOA P7/2000 set of data for a directional survey, and the formation tops for a given well. These are classified as a “table,” because most of these sets of data can be organized into a table. However, many of the comments included here would refer to any complex set of data, such as a geodetic survey location, or a public land survey location.

This paper will discuss four ways defining tables. When it is recognized that the user community is dealing with a “table,” these four methods should be considered for the XML.

Section 8 mentions a different method (storing by columns) that is generally not appropriate.


2.Some Properties of a Table

A table is a generic term for any complex set of information. A table does not have a natural identifier - it is strictly a complex property of a business object. When giving the formation tops for a well, it does not make sense to give an identifier to the table of information. It takes its identity from the business object (the well, in this example) that it is a property of.

A table consists of more than one column of information. For example, the formation tops might have two columns (formation name, measured depth). In its most general case, there would be more than one row (or set) of these values. Thus, a formation tops table would be a set of (formation name, measured depth) pairs.

The meaning of the columns does not change for different rows. This means that I can give the meaning of each component, and the meaning will hold for all sets (rows) of the data. For example, I can define the first column as the formation name, a string value, controlled by a fixed list of values appropriate for the state of Michigan. The second column is the measured depth in feet. With these meanings, all the pairs of values (in this particular XML instance) will be the same: we will not give you some depths in metres, for example.

The methods are described from the most specific to the most general. The most specific method is easiest to read and write, but is not re-usable from one application to another. The most general will handle any table of information, but puts a burden on the reader and writer to define the table (the writer) and to understand it (the reader).

2.1.Example table

Here is a formation tops table (2 columns, 3 rows) that will be used in the examples.

Formation Tops

Measured Depths

Dundee

2217 ft

A Carbonate

2912 ft

Niagara

3016 ft


3.Method 1: Tag the Columns

With this method, every piece of data is tagged2. An example of the XML would look as follows:

<FormationTops>
  <MeasuredDepthUOM>ft</MeasuredDepthUOM>
  <Row>
    <Name>Dundee</Name>
    <Depth>2217</Depth>
  </Row>
  <Row>
    <Name>A Carbonate</Name>
    <Depth>2912</Depth>
  </Row>
  <Row>
    <Name>Niagara</Name>
    <Depth>3016</Depth>
  <Row>
</FormationTops>

Note that there are many variations to the above. For example, the element name, Row, is not very descriptive. Also, it should be noted that somewhere (in a specification about how to use the table) I have stated that the Depths are to be measured depths. I could, alternatively, have set the element name to MeasuredDepth.

The main thing to note is that all six of the data values in the basic table have tags. If there is a lot of data to transmit, it could lead to a very large file. On the other hand, it is quite easy to pick out any of the values using Xpath [Xpath], for example. There is no requirement that the reading application be able to parse a string to separate the components (columns).

Let me make more comments about the specification of the table. The table was designed with Formation Tops in mind (hence, the element name). It was predefined as two columns - one for the formation name, and one for the depth. The specification would also allow the writer to use any unit of measure, as long as every depth is the same unit. The unit or measure being used in a file is specified in a separate tag. Thus, the table has a little flexibility built into it. But its structure and meaning are very tightly defined.


4.Method 2: Rows as Tuples

A second way to send the information is to build a tuple. Instead of having separate tags for each data value, a row at a time will be sent as a tuple. Here is an example of how the above would look:

<FormationTops separator=”|”>

<MeasuredDepthUOM>ft</MeasuredDepthUOM>

<Heading>Formation Name|Measured Depth</Heading>

<Row>Dundee|2217</Row>

<Row>A Carbonate|2912</Row>

<Row>Niagara|3016</Row>

</FormationTops>


Because the table is very predictable, I can collapse the XML by outputting a row at a time. This makes for a more compact XML file. As with the previous case, I need to define the table structure in the specification, which means that this type is useful only for formation tops.

The main difference, of course, is that the reader needs to parse the data to obtain the separate Name and Depth values from a row, rather than relying on Xpath [Xpath]. The particular implementation shown above allows the separator to be declared in order to aid with the parsing3.

Note that the meaning of the data is not clear to a user - she must guess that the first component is the name, and the second is the measured depth. The Heading element was added as an informative item so that the reader can more easily understand the meaning of the columns.

Thus, in Method 2, the reader must parse the information to obtain the data, but the meaning of the data is defined in the specification for the particular data type.

This method is useful for large amounts of data. However, a separate table type must be constructed for each use.


5.Method 3: Pre-defined tuples

Method 2 relies on a definition, in some document, of a tuple. In the example shown, the tuple is the pair (formation name, measured depth). We can use the same table by defining another tuple.

First, we give a name to the above tuple - say, MDTop. We can define another tuple, TVDTop, as the pair (formation name, true vertical depth), or even one with both columns (MDTVDTop): (formation name, measured depth, true vertical depth). If you are careful, you can define even more tuples (with names) to make even more use of a table.4 In order to define the table, of course, we would need another element to say which tuple was being used.

<FormationTops separator=”|”>
  <TupleName>MDTop</TupleName>
  <MeasuredDepthUOM>ft</MeasuredDepthUOM>
  <Heading>Formation Name|Measured Depth</Heading>
  <Row>Dundee|2217</Row>
  <Row>A Carbonate|2912</Row>
  <Row>Niagara|3016</Row>
</FormationTops>

An alternative use of the table would have the true vertical depths:

<FormationTops separator=”|”>
  <TupleName>TVDTop</TupleName>
  <TVDepthUOM>ft</TVDepthUOM>
  <Heading>Formation Name|True Vertical Depth</Heading>
  <Row>Dundee|2208</Row>
  <Row>A Carbonate|2900</Row>
  <Row>Niagara|3003</Row>
</FormationTops>

Clearly the table is more flexible in its use. But it does put more of a burden on the reader and writer. The writer must recognize which tuple he is sending, and must declare it. In the example above, he must also choose the right unit of measure element to specify the unit of measure being used. The reader must be able to read any of the standard tuples that are defined in the specification (3 in this case). There will generally be a small, finite number of these tuples, which may not place too large a burden on the reader.

The main cost to this method is the final clarification of some of the components. In the example we have been using, we have always left it to the writer to specify the unit of measure, which is the missing bit of information about the tuple. This may get very complicated to do if the tuple is large. For example, the tuple for the directional survey (UKOOA P7/2000) may be up to 13 components long. Many of these need a unit of measure. Some of them need a reference to a coordinate reference system (pairs of components, actually, in this case). Thus, the final clarification of these components may be very complicated. If this method is used, some thought would need to be given to this final component clarification.


6.Method 4: General Tuple definition

It is possible to define a single table structure that would be used anywhere. This means that the table must be defined in toto within the XML instance file itself. When there is no, or very little knowlegde, of the structures of the tables to be recorded, this may be the best solution. This may also be the solution if a table structure will only be used a few times, rather than going to the trouble of defining its structure specifically, as is done in the first three methods.

Here is a sample of how the XML might look. Notice that each component must be defined.

<FormationTops separator=”|” columnCount=”2”>
  <Component column=”1”>
    <Name>Formation Name</Name>
    <DataType>string</DataType>
    <Description>The name of the formation the wellbore penetrates</Description>
  </Component>
  <Component column=”2”>
    <Name>Measured Depth</Name>
    <DataType>quantity</DataType>
    <UnitOfMeasure>ft</UnitOfMeasure>
    <Description>The measured depth to the top of the formation</Description>
  </Component>
  <Row>Dundee|2217</Row>
  <Row>A Carbonate|2912</Row>
  <Row>Niagara|3016</Row>
</FormationTops>

Since the writer knows nothing about the table beforehand, she must be able to “build” the table from the components information and from the Rows of data being given. It therefore becomes very important to build a common way to describe the table (ie., how do we do the Component elements). The reader can understand any table, if she understands the Components.


7.Things to Consider

Various things to consider with the above methods

7.1.A table is a property

A table is a property of a business object. It has no meaning alone. It is only meaningful when it is associate with a business object.

The way this is done cannot be generalized. Here are some ways to do this:

  1. The table is a standalone XML instance, with a reference to the object it is associated with.

  2. The table is part of a standalone Activity instance, which is referenced to an object.

  3. The table has a referenceable id attribute. The object associated with this table can reference the table.

  4. The table is a child of the object it is associated with.

The three way relationship - A Property of an Object is generated by an Event - allows various ways to implement the above associations.

The application would, in the end, determine which choice is used.

7.2.Some components of tuples come in pairs

When building a tuple for Method 4, it is possible that some components come in pairs (or more) of values. For example, the directional survey table of the UKOOA P7/2000 format has a 13 component tuple. Among the components are pairs for the latitutde and longitude, for the easting and northing, and for the offset north and east. Defining the tuple in a general way that can define these pairs necessary, and will take some thought.


8.Storing by Columns

A column would consist of all the values of a single component. A sample XML would be

<FormationTops>
  <Column name=”Formation name” type=”string” separator=”|”>
    Dundee|A Carbonate|Niagara</Column>
  <Column name=”Depth” type=”quantity” uom=”ft” separator=”|”>
    2217|2912|3016</Column>
</FormationTops>


<FormationTops>
  <Name separator=”|”>
    Dundee|A Carbonate|Niagara</Column>
  <Depth uom=”ft” separator=”|”>
    2217|2912|3016</Column>
</FormationTops>

Note that somewhere, somehow, you must give information about the columns. In this particular case, it is all given in the Column element. There is enough information for the reader to “form” the table, with headings.

But there is a key difficulty. The reader must re-align the columns. We have broken up the tuple, and have given it to you a component at a time. There is the implication that the nth value in the columns are a part of the same tuple. The reader must regain this relationship to get the tuples correctly formed.

In the above example, this is rather easy. The main problem comes when a column has one or more missing values. A method of handling these null values must be explicitly developed so that the rows can be reformed.

Storing by Columns remains a viable method, but is not recommended at this time.


9.References

9.1.Outside References

[ANSIX12] X12 Reference Model for XML Design, 2002-10, produced by the ANSI X12 committee, obtainable at http://www.x12.org/x12org/.

[BestPractices] Best Practices Homepage, developed and maintained by XML-dev and Mitre, obtainable at http://www.xfront.com/BestPracticesHomepage.html.

[ComProServ] PIDX XML Standards Master, Version 1.0, RP 3901, produced by PIDX, obtainable at http://committees.api.org/business/pidx/standards.htm.

[EBCCNAM] ebXML RT - Naming Convention for Core Component, 2001-05-10, produced by the ebXML group, obtainable at http://www.ebxml.org/specs/index.htm#technical_reports.

[ebTechArch] ebXML Technical Architecture Specification V1.0.4, 2001-02-16, produced by the ebXML group, obtainable at http://www.ebxml.org/specs/ebTA.pdf.

[FedDevGuide] Draft Federal XML Developer's Guide, 2002-04 (work in progress), produced by the Federal CIO Council, obtainable at http://xml.gov/documents/in_progress/developersguide.pdf.

[FedTagStds] Federal Tag Standards for Extensible Markup Language, 2001-06, produced by LMI, not obtainable from the internet.

[HKGuide] XML Schema Design and Management Guide, (4 parts), Draft versions dated in summer, 2003. Produced by Hong Kong Information Services Technology Division. Available at http://www.itsd.gov.hk/itsd/english/infra/eif.htm.

[IETFKeywords] Key Words for Use in RFCs to Indicate Requirement Level, 1997-03, obtainable at http://www.ietf.org/rfc/rfc2119.txt.

[ISO8601] International Standard Date and Time, 2001-11-10, produced by ISO. A web page that explains the formats is http://www.cl.cam.ac.uk/~mgk25/iso-time.html.

[ISO11179] ISO 11179 Part 5 - Naming and Identification, 1995-12, produced by ISO, obtainable at http://fdr.faa.gov/iso/ISO11179page.htm. There is a later version, that is available from the ISO website,

[UKGuide] e-Government Schema Guidelines for XML, 2002-12, produced by United Kingdom e-Envoy, obtainable at http://www.e-envoy.gov.uk/Resources/Guidelines/fs/en.

[Unicode] Unicode Charts, available at http://www.unicode.org/charts/.

[W3CSchemaDatatypes] W3C Schema Datatypes, 2001-05-02, produced by W3C, obtainable at http://www.w3.org/TR/xmlschema-2.

[W3CNamespaces] Namespaces in XML, 1999-01-14, produced by W3C, obtainable at http://www.w3.org/TR/REC-xml-names/.

[W3CSchemaPrimer] W3C Schema Primer, 2001-05-02, produced by W3C, obtainable at http://www.w3.org/TR/xmlschema-0.

[W3CSchemaStructures] W3C Schema Structures, 2001-05-02, produced by W3C, obtainable at http://www.w3.org/TR/xmlschema-1.

[Xlink] W3C XLink Specification, 2001-06, produced by W3C, obtainable at http://www.w3.org/TR/xlink/.

[Xpath] W3C XPath Specification, 1999, produced by W3C, obtainable at http://www.w3.org/TR/xpath/.

[XSL] W3C XSL and XSLT Specifications, produced by W3C, obtainable at http://www.w3.org/Style/XSL/.

9.2.POSC References

POSC references are available in the following formats:

[html] html format readable by browsers

[doc] MS Word 97/2000/XP

[sxw] OpenOffice writer, v1.0

[IntroModule] Introduction to Modules, Copyright 2002-2003. Available in [html], [doc], [sxw].

[BuildModule] Build a Module - a tutorial. Copyright 2003. Available in [html], [doc], [sxw].

[ImportModule] Importing Modules within your Modules. Copyright 2003. Available in [html], [doc], [sxw].

[Guidelines] Guidelines for XML Schemas, Version 2003. Copyright 2003. Available in [html], [doc], [sxw].

[ModulePolicies] Policies on Modules. Copyright 2002-2003. Available in [html], [doc], [sxw].

[ProfilesAppSchema]  Modules, Profiles, and Application Schemas. Copyright 2002-2003. Available in [html], [doc], [sxw].

[XMLTables]  XML Tables. Copyright 2003. Available in [html], [doc], [sxw].

[ReferenceData]  Reference Data and Enumerated Lists Implemented in XML. Copyright 2002-2003. Available in [html], [doc], [sxw].

[Dictionaries]  Examples of XML Dictionary Usage. Copyright 2003. Available in [html], [doc], [sxw]. Accompanied by sample code.

[Relationships] Relationships in XML. Copyright 2003. Available in [html], [doc], [sxw].

[UOMRecs]  Unit of Measure Recommendations. Copyright 2002-2003. Available in [html], [doc], [sxw].


1© 2004, Petrotechnical Open Standards Consortium, Inc. All rights reserved. All access, receipt, and/or use of this document is subject to the POSC Product Licensing Agreement posted on the POSC Web site at http://www.posc.org/about/license.shtml.

2In this paper, the tags will always be elements. You could, of course, develop the tables with attributes, rather than elements.

3It is not that difficult to parse one of these rows in XSL [Xsl]. However, it is an additional step.

4For example, we could the main element as Markers, instead of FormationTops. This would allow us the also include depths of markers other than just tops. We would add a third column which could be one of {top, base, fluid contact, etc).



2003-07-22 XML Tables Page 6 / 6