XML Patterns and Objects

For more up-to-date information on XML projects, go to the XML Modules area.

Introduction

One of the major contributions that POSC can make to the oil and gas industry is to define standard objects for transfer of data and information. When XML became a standard way for transfer, POSC began studying how to implement such a process. The specification of standard objects using DTD’s was limited, due to the limitations of the DTD. XML Schema allows these objects to be specified in such a way that they may be reused in standard exchange documents.

Go to the W3C page for information on XML Schema.

In developing these standard objects, it became apparent that there should also be consistent ways to develop the objects. These ways are grouped under the general heading of patterns, although it is difficult to separate the developed objects from the patterns. By combining these patterns with the objects that are being developed, it is becoming possible to more easily generate standardized exchange sets - even though these standardized sets are built for a particular use. In effect, we are finding it common to standardize on the pieces rather than on the whole.

The process of using pre-built modules involves collecting them from one or more web sites for use within a new application schema. When doing this, the schema developer jumps headlong into the issue of schema namespaces. The Namespace tutorial document will give some of the requirements necessary to use the POSC schemas.

Although somewhat of a side issue, it may be interesting to review the XML Schema: Best Practices. Much of the schema development discussed in that web site has been practiced here. Please be warned that this site is probably far beyond anyone's need to know.

Goals of this project

There are three goals that are driving this project:

  1. Develop a set of classes and patterns for how to use them.
  2. Set up a process that allows another group to develop a schema specification at another site, while using the objects that are available at the POSC web site.
  3. Set up the classes so that the complexity of the schema definitions (namespaces, in particular) are transparent to the generator of the XML.
As mentioned above, the development of XML Schema is helping to meet these goals. However, the implementations of the XML Schema recommendation are not fully compatible yet. I.e., there are some things that can be done in XML Schema that are not allowed by the present implementations of schema checking. The procedures recommended in this document are designed to meet the implementations. It is hoped that future implementations will make some of the rigmarole unnecessary. Thus, the recommended procedures for incorporating these objects may change with time.

Contents:

Units of Measure

Note:Due to significant work since January 2001, the section on units of measure has been significantly revised.

Further Note: The Units of Measure dictionaries have been instantiated. The latest dictionary, effective on 2005-01-01, is found at http://www.posc.org/refs/poscUnits20.xml.

Units of Measure are a common problem in all exchange data sets. There are many approaches that are in use for specifying the units for a particular quantity. POSC, in conjunction with several other organizations, has developed a particular method for handling units that works in a very general sense. It is recommended that all exchange sets conform to this pattern. If exchange set developers conform to this pattern, it will be easier for applications to handle it.

Several documents and web sites have been developed in the past 5 months that describe the problem, propose solutions, and apply the solutions to particular problems. Although the text below can give an indication of the problems and solutions, you should go to the referenced documents to get a detailed picture of the units of measure problems and solutions.

The initial effort was to put forth the problem. The "Problems" document outlined several methods in which units of measure were being handled in XML sets. These were all taken from actual examples.

  • You can download the Units of Measure problems document, which explains the issues.
  • Several organizations became involved in an email discussion on best practices. From these discussions came the "Recommendation" document, which details several recommendations and patterns which will allow interoperability when dealing with units of measure.

  • You can download the final Units of Measure recommendation document (May 8 version) which proposes the recommended solutions.
  • The document was worked on actively, and received several revisions. In order that active workers could keep track of the changes being made, an earlier version of the recommendations document is available.

  • You can download the earlier recommendation document.
  • Supporting the recommendation is the XML Schema. Also, a part of the recommendation is that a units dictionary be developed. To support the recommendation, a sample xml file with an xsl was developed. The xsl demonstrates how to access the information in the dictionary files.

    Finally, it is useful to see a schema that incorporates the units of measure pattern into it. See The CSIRO site web page.

    Following is a brief description of the units of measure problem and recommendations.

    A list of ways that units are handled is:

    1. Ignore. Assume everyone know what units are being used.
    2. Specify a particular set of units to use.
    3. Insert units into tag names. E.g. DepthInFeet
    4. Specify a set of well known unit (symbols).
    5. Define the unit whenever it is needed in the file.
    6. Define the units somewhere, and reference them when needed.

    The problems document gives the advantages and disadvantages of each method. It is this last method that is used in POSC.

    The basic pattern is as follows:

    Here is a sample of how this is done. The sample will define the metre very simply, and then will define the US Survey foot by giving the conversion values to a metre.

      <UnitsDefinition>
       <UnitOfMeasure uid="m"/>
       <UnitOfMeasure uid="ft" acronym="US ft">US Survey foot
         <ConversionToBaseUnit baseUnit="#m">
           <numerator>12.</numerator>
           <denominator>39.37</denominator>
         </ConversionToBaseUnit>
       </UnitOfMeasure>
      </UnitsDefinition>
    

    Since the units are defined, they can be referenced. In this case, only the "m" and the "ft" can be used to reference units. The sample below shows the referencing:

       <Ellipsoid flatteningDefinitive="no">
        <identifier>Clarke 1858</identifier>
        <semiMajorAxis uom="#ft">20926348</semiMajorAxis>
        <semiMinorAxis uom="#ft">20855233</semiMinorAxis>
       </Ellipsoid>
       
       <Ellipsoid>
        <identifier>Bessel Namibia</identifier>
        <semiMajorAxis uom="#m">6377483.865</semiMajorAxis>
        <inverseFlattening>299.1528128</inverseFlattening>
       </Ellipsoid>
    

    Note that the data itself is easy to read. Furthermore, the meaning of the units m and ft are clearly defined in the file. POSC recommends this pattern for handling units.

    The specification

    The XML Schema, technical comments on it, examples, etc. are given in the Unit of Measure Recommendations paper referenced earlier. Examples of its use are also given in the other documents referenced above.

    Useful Data Types

    When developing schema, there are some datatypes that show up often enough that they can be abstracted out, and used in other schema. Following is a list of such datatypes, and the link to a more detailed description of them:

    In addition, there are types that have been defined to support the Epicentre mapping. These will probably only be useful in the context of transferring Epicentre datatypes. They are listed here:

    Data Types

    The goals of the PEF XML project were to develop a method for mapping Epicentre into an XML exchange document. That goal was accomplished in part by mapping each of the Epicentre data types into XML. The document, Exchange Format, details this mapping.

    Because of the methodology of this mapping, it is also possible to apply it to other data models. Any data model that uses the Epicentre data types (or a subset of the Epicentre data types) can use the information in this document to form an XML file.

    Both of these goals are mentioned in the Exchange Format document. This note goes a step beyond that to discuss the use of the data types themselves.

    It is possible to use the Epicentre data types independently of any data model. For example, the timestamp, the date, the quantity, the complex number, etc. structures may be useful outside of any data model that specifically uses these data types.

    Consider, for example, the tag, "spudDate." How should a spud date be represented in XML? One possibility is to make it a parsed date, as was done in the data types schema:

     <element name="spudDate" type="parsedDate">
    
    which would lead to an XML such as
     <spudDate>
      <year>1999</year>
      <month>6</month>
      <day>24</day>
     </spudDate>
    

    Use of this predefined structure not only makes the schema document easier to develop and understand, but it also increases the interoperability. If all applications that needed a parsed date structure were to use this data type, the structure would be well-defined and well-understood. Applications that understand this structure could then be reused with other XML documents.

    Another example of its use would be if the schema developer wishes to give a choice of date formats. For example, she may wish to give the option of a parsed date, an ISO formatted date, or a US formatted date. She could then define the spudDate element as

     <element name="spudDate">
      <complexType>
       <choice>
        <element ref="pef:date"/>
        <element name="isoDate" type="string"/>
        <element name="USDate" type="string"/>
       </choice>
      </complexType>
     </element>
    
    where the "date" element is already defined in the data types document, and the other two date types can be defined in the present document.

    Details on use of the Data Types and the data type elements can be found by referring to the document, Data Type Usage.

    Dates

    Dates can be in one of five forms:

    1. Parsed date.
    2. ISO date
    3. W3C formats
    4. Locally defined format
    5. Undefined text
    Each of these will be further explained.

    Parsed date

    The date is broken into its parts. There are separate tags for year, month, and day. As presently constituted, the year is four digits, the month is one or two (1-12), and the day is one or two (1-31). Direct use of the parsedDate data type will give this structure. Note that the parsedDate data type allows year, month, day, or year, month, or year only as inputs.

    ISO Date

    The ISO date is defined by ISO 8601. See the Summary Web Page for a description of the ISO date format. In essence, it is of the form YYYY-MM-DD. XML Schema implements the full format in its date data type.

    W3C Formats

    The W3C XML Schema defines the following date data types: date, year, month, century, recurring date, and recurring day. These correspond to various portions of the date: full date, year only, year and month only, century only, month and day only, and day only. These may be used (and combined) as needed. Note that the full ISO date format allows all of these choices also.

    Locally defined format

    There are formats defined locally. The two major ones are the US format (MM-DD-YYYY) and the European format (DD-MM-YYYY). Note also that locally defined formats may replace the "-" with "/". Clearly dates in these formats are ambiguous, which is why ISO defined a standard format. However, these formats may be used, provided the document details the format and its meaning.

    Undefined text

    In some cases, a date is given with no knowledge of its meaning. Clearly, 06/02/98 is ambiguous. However, the undefined text format allows dates of this format to be exchanged, with the understanding that the meaning of it is unknown. This often occurs when the input file uses such an undefined format and the user wants to keep the information, but has now way of interpreting the information.

    Usage Guidelines

    Whenever a date is to be included in an exchange set, the specification document must choose one or more of the alternatives above. While the parsedDate and the W3C Formats are predefined for users, the others must be specified either using XML Schema patterns or text descriptions in a written document. It is clearly up to the users to decide which formats to allow and how to specify them.

    However, there are guidelines that should be understood and followed.

    1. The parsed date is most easily manipulated by XML style sheets and other applications. But it may require a formatted date to be parsed before the information can be expressed in XML.
    2. The ISO date is the most general format. Generally, though, even the ISO date must is restricted to reduce the available options (for example, an application may not allow the week format, 95W05, or the "non-dash" format, 19951206). Use of the ISO date should generally list the acceptable formats.
    3. The W3C formats can be used. It is generally appropriate to decide which of the types should be accepted, then form a union of the acceptable ones.
    4. The local formats must be accompanied with a strict definition of the format, and its meaning. POSC recommends against the use of such formats, since each specification must have its own software developed to interpret the meaning of the date.
    5. The undefined text should only be used when a date is taken from a legacy document, and "dumped" into the XML exchange document. However, it should be understood that this method does not solve the date problem - it only puts it off until a later time.

    Using strings for Months

    Because of language differences, the use of the string for a month (JUN instead of 06) is discouraged.

    It should be noted that style sheets and other applications can convert the standard date formats into readable strings, such as 24 Dezember 2001, or December 24, 2001. Readability can be added at many stages. However, the meaning must be clear, and interoperable, in the exchange set.

    Abstract Location

    Wells, leases, fields, buildings all have locations. The means of giving a location to these features varies - depending on the object and the requirements of the receiver. The AbstractLocation data type is a combination of four methods of giving a location:

    1. A geopolitical location (state, country, county)
    2. A legal description (township, range, etc.)
    3. Offshore location (block name, area name)
    4. Survey location (geographic latitude/longitude or projected coordinates)

    The AbstractLocation data type is an attempt to gain interoperability using a single structure whenever any location of the above is needed. Here is an example of its use:

      XML Schema:
      <element name="WellLocation" type="posc:AbstractLocation"/>
      <element name="BottomholeLocation" type="posc:AbstractLocation"/>
    
      Sample XML:
      <WellLocation status="actual">
       <GeopoliticalLocation>
        <country code="US">United States</country>
        <state>Texas</state>
        <county>Val Verde</county>
       </GeopoliticalLocation>
       <SurveyLocation>
        <srsName>NAD 83</srsName>
        <gml:location>
         <gml:Point srsName="epsg:4267">
          <gml:coordinates>27.2529953,-101.966394</gml:coordinates>
         </gml:Point>
        </gml:location>
       </SurveyLocation>
      </WellLocation>
      ...other information...
      <BottomholeLocation status="proposed">
       <SurveyLocation>
        <srsName>NAD 83</srsName>
        <gml:location>
         <gml:Point srsName="epsg:4267">
          <gml:coordinates>27.2529681,-101.984422</gml:coordinates>
         </gml:Point>
        </gml:location>
       </SurveyLocation>
      </BottomholeLocation>
    
    

    In addition to the full, AbstractLocation object, there are intermediate objects that can be used (or restricted). For example, there is an offshoreLocation object. If the full AbstractLocation object does not meet the needs of the application, a lower level object can be chosen. A description of the lower level objects, and of how to incorporate them into the schema, is described in the document on Usage of Location Object.


    Last Modified: 2003-04-25
    © Copyright 2001 POSC. All rights reserved.