Guidelines for XML Schema1

Version 2003

John I. Bobbitt

POSC

bobbitt@posc.org



License Agreement: © 2004, Petrotechnical Open Standards Consortium, Inc. All rights reserved. All access, receipt, and/or use of this document is subject to the POSC Product Licensing Agreement posted on the POSC Web site at http://www.posc.org/about/license.shtml.

Abstract: Guidelines and recommended practices are given for development of XML Schemas within the POSC framework.

Modified 2004-03-09. Modified our agreement to UK Guide to state that the POSC policy is that element names are UpperCamelCase, and attribute names are lowerCamelCase.

1. Introduction

A developer of XML schemas is faced with many different style and implementation decisions in the development process. A set of guidelines will help the developer eliminate many of these choices, and lead to more consistent schemas. They will also lead to more consistent XML instance documents based on the schemas.

This project was initiated in 2001. At that time, there were a dozen or so similar documents by different groups. The goal of the POSC document was to find where consensus existed in many of the issues, and to document and follow these consensus best practices. The first version of this document discussed and codified some of these issues.

In general, usage of XML schemas was too new to have a consensus of best practices. Most of the other guidelines documents were struggling to understand the choices, and to develop their guidelines. Since then, there has been more understanding of the choices, and consensus in some of the areas.

1.1.The UK e-Government Schema Guidelines

Rather than repeat all of these items in 'yet another guidelines document,' we will adopt a particular document, and comment briefly on variations from that document. The adopted document is e-Government Schema Guidelines for XML, Version 3.0 dated December 2002, hereinafter referred to as the 'UK Guide.' It is published by the Office of the e-Envoy of the United Kingdom, and is available at http://www.e-envoy.gov.uk/Resources/Guidelines/fs/en.. (See [UKGuide]). It is recommended that the reader of this paper also have this document readily available, since many of the details of the guidelines are contained in the UK Guide.

The UK Guide has three sections for recommendations: Schema Guidelines, Schema Component Guidelines, and Metadata and Schemas. All three sections will be considered in Section 2, along with summaries and comments on variances from their recommendations. These variances will be minor, and, as a whole, the UK Guide is adopted as a guideline for POSC development2.

1.2.Additional Guidelines

There are many issues that are not covered in the UK Guide. These fall into the realm of best practices, and are discussed in detail in several papers developed by POSC. These documents will also be part of the Guidelines. Section 3 will cover these best practice documents.


2. Summary of the UK Guide

Three chapters will be covered in summary fashion. The Guidelines are summarized in sections 2.1 and 2.2 below, and comprise (in general) three bulletted items. The first will be a quote from the UK Guide and the second bullet will be a statement of the POSC compliance with that item. The third (and occasionally additional) bullet will be a comment on the degree of POSC endorsement.

2.1.Chapter on XML Schema Guidelines

Primary Schema Language:

Schema Complexity:

Model Data, Not Forms

Use of Namespace and Qualifiers

Namespace and versions

Use of elementFormDefault and attributeFormDefault

Messages and Schema

Data Type vs. Element Declarations

Global Definitions:

Common Definitions and Namespaces:

Elements vs. Attributes:

Indicating Value Sets

Representing alternative conditions:

Commenting Schemas:

Use of Schema Reuse Features

2.2.XML Schema Component Guidelines

Naming Conventions

Naming Conventions

Naming Conventions

Naming Conventions

Use of Government Data Standards Catalogue

Use of Schema Inheritance

Data Content of Elements

Local vs. Global Attribute Definitions

Use of Mixed Content Model for Data

2.3.MetaData and Schemas

This section deals with defining meta data for the data elements for inclusion in a registry, and for indicating versions of the schema being used. None of the recommendations in this section are part of the POSC guidance document.


3. Additional Best Practices

3.1.Supplements to Section 2

Following are additional guidelines developed by POSC

Naming Conventions:

Naming Conventions:

'Final' facet, 'block' facet

3.2.Architectural Practices at POSC

Many elements of the POSC XML schema architecture are explained in additional papers. These will be referenced later in this section.

The main part of the POSC architecture is the module. A module is a business object, encoded in XML and characterized by its XML Schema. Examples of modules would be a Well, a Business Associate, and a Directional Survey. A module is characterized by having a natural identifier. Its instantiation would be a self-contained set of information.

Comparable to, but slightly different from a module, is the table. A business object has properties. A table is a complex property of an object that is usually displayed as a table. Examples of tables would be daily production of a well, chemical usage for a month, and directional survey data. A table has no natural identifier. It may be instantiated as a self-contained set of information, but only makes sense in its context as a property of some business object. A table generally only has use in a single schema.

Another type of complex data is an assembly. An assembly is a set of information that is tightly coupled, but is usually not displayed as a table. However an assembly does constitute information that can be used in many contexts. An example of an assembly would be a survey location, which consists of (for example) a datum, a latitude, and a longitude. Other examples are a location in the US land survey system and a postal address. An assembly obtains its full semantic meaning only within the context of its use. For example, a survey location can be used as a well surface location, a wellbore bottomhole location, the location of a tower, or the location of a ship antenna.

A less complex data type than an assembly would be a component4. A component is a single element, or a very tight group of elements that generally represent a single value. Examples of components would be a measure type (value and unit of measure), complex number (real and imaginary parts), and an azimuth (value, unit of measure, and reference north direction). Components are used to build assemblies, tables, and modules.

It is not always clear where the lines are drawn between these various types. It is generally not important. However, it does give a reasonable taxonomy for discussion.

3.2.1.Guidelines for POSC Architecture

Namespaces for architectural types:

3.3.Guidelines for Modules

See the Introduction to Modules paper [IntroModule] for more examples of the modules.

For an example of building a module, see the paper, Build a Module [BuildModule].

Modules may be imported into other modules. See Importing Modules within your Module [ImportModule] for guidelines and best practices.

For a description of modules, profiles, and application schemas, see the paper, Modules, Profiles, and Application Schema [ProfilesAppSchema].

Outside groups may use POSC modules, tables, assemblies, and components. See the paper on Policies on Modules [ModulePolicies] for the requirements for importing and/or including schemas. POSC will also follow the rules for importing modules from other groups.

POSC Usage of Modules

Modification of Modules

POSC Usage of Components and Assemblies

Modification of Components and Assemblies

3.4.Additional POSC Best Practices

POSC, in conjunction with other groups, developed a best practice for handling units of measure. The recommendations for this are given in the Units of Measure Recommendation paper, [UOMRecs].

POSC has developed a process for enumerated data. The types of enumerated data, and how they should be handled, are discussed in the paper, Reference Data and Enumerated Lists Implemented in XML, [ReferenceData].

Data models have relationships between entities, which are emulated by relationships between XML modules. POSC has developed a paper on the three ways to handle relationships, with guidelines on handling the relationships. See Relationships in XML, [Relationships]

Dictionaries and registries are useful for capturing information about commonly referenced objects (such as coordinate reference systems. POSC has developed a paper, Examples of XML Dictionary Usage, [Dictionaries] along with some examples of dictionary usage in XML.


4. References

4.1.Outside References

[ANSIX12] X12 Reference Model for XML Design, 2002-10, produced by the ANSI X12 committee, obtainable at http://www.x12.org/x12org/.

[BestPractices] Best Practices Homepage, developed and maintained by XML-dev and Mitre, obtainable at http://www.xfront.com/BestPracticesHomepage.html.

[ComProServ] PIDX XML Standards Master, Version 1.0, RP 3901, produced by PIDX, obtainable at http://committees.api.org/business/pidx/standards.htm.

[EBCCNAM] ebXML RT - Naming Convention for Core Component, 2001-05-10, produced by the ebXML group, obtainable at http://www.ebxml.org/specs/index.htm#technical_reports.

[ebTechArch] ebXML Technical Architecture Specification V1.0.4, 2001-02-16, produced by the ebXML group, obtainable at http://www.ebxml.org/specs/ebTA.pdf.

[FedDevGuide] Draft Federal XML Developer's Guide, 2002-04 (work in progress), produced by the Federal CIO Council, obtainable at http://xml.gov/documents/in_progress/developersguide.pdf.

[FedTagStds] Federal Tag Standards for Extensible Markup Language, 2001-06, produced by LMI, not obtainable from the internet.

[HKGuide] XML Schema Design and Management Guide, (4 parts), Draft versions dated in summer, 2003. Produced by Hong Kong Information Services Technology Division. Available at http://www.itsd.gov.hk/itsd/english/infra/eif.htm.

[IETFKeywords] Key Words for Use in RFCs to Indicate Requirement Level, 1997-03, obtainable at http://www.ietf.org/rfc/rfc2119.txt.

[ISO8601] International Standard Date and Time, 2001-11-10, produced by ISO. A web page that explains the formats is http://www.cl.cam.ac.uk/~mgk25/iso-time.html.

[ISO11179] ISO 11179 Part 5 - Naming and Identification, 1995-12, produced by ISO, obtainable at http://fdr.faa.gov/iso/ISO11179page.htm. There is a later version, that is available from the ISO website,

[UKGuide] e-Government Schema Guidelines for XML, 2002-12, produced by United Kingdom e-Envoy, obtainable at http://www.e-envoy.gov.uk/Resources/Guidelines/fs/en.

[Unicode] Unicode Charts, available at http://www.unicode.org/charts/.

[W3CSchemaDatatypes] W3C Schema Datatypes, 2001-05-02, produced by W3C, obtainable at http://www.w3.org/TR/xmlschema-2.

[W3CNamespaces] Namespaces in XML, 1999-01-14, produced by W3C, obtainable at http://www.w3.org/TR/REC-xml-names/.

[W3CSchemaPrimer] W3C Schema Primer, 2001-05-02, produced by W3C, obtainable at http://www.w3.org/TR/xmlschema-0.

[W3CSchemaStructures] W3C Schema Structures, 2001-05-02, produced by W3C, obtainable at http://www.w3.org/TR/xmlschema-1.

[Xlink] W3C XLink Specification, 2001-06, produced by W3C, obtainable at http://www.w3.org/TR/xlink/.

[Xpath] W3C XPath Specification, 1999, produced by W3C, obtainable at http://www.w3.org/TR/xpath/.

[XSL] W3C XSL and XSLT Specifications, produced by W3C, obtainable at http://www.w3.org/Style/XSL/.

4.2.POSC References

POSC references are available in the following formats:

[html] html format readable by browsers

[doc] MS Word 97/2000/XP

[sxw] OpenOffice writer, v1.0

[IntroModule] Introduction to Modules, Copyright 2002-2003. Available in [html], [doc], [sxw].

[BuildModule] Build a Module - a tutorial. Copyright 2003. Available in [html], [doc], [sxw].

[ImportModule] Importing Modules within your Modules. Copyright 2003. Available in [html], [doc], [sxw].

[Guidelines] Guidelines for XML Schemas, Version 2003. Copyright 2003. Available in [html], [doc], [sxw].

[ModulePolicies] Policies on Modules. Copyright 2002-2003. Available in [html], [doc], [sxw].

[ProfilesAppSchema]  Modules, Profiles, and Application Schemas. Copyright 2002-2003. Available in [html], [doc], [sxw].

[XMLTables]  XML Tables. Copyright 2003. Available in [html], [doc], [sxw].

[ReferenceData]  Reference Data and Enumerated Lists Implemented in XML. Copyright 2002-2003. Available in [html], [doc], [sxw].

[Dictionaries]  Examples of XML Dictionary Usage. Copyright 2003. Available in [html], [doc], [sxw]. Accompanied by sample code.

[Relationships] Relationships in XML. Copyright 2003. Available in [html], [doc], [sxw].

[UOMRecs]  Unit of Measure Recommendations. Copyright 2002-2003. Available in [html], [doc], [sxw].



1 © 2004, Petrotechnical Open Standards Consortium, Inc. All rights reserved. All access, receipt, and/or use of this document is subject to the POSC Product Licensing Agreement posted on the POSC Web site at http://www.posc.org/about/license.shtml.

2The first two sections will be adopted, with modifications as noted in section 2. The thrid section of the UK Guide will not be adopted, since this section is more relevant to the particular practices of the group.

3UK Guide comment: “In general, this means that architectural schemas will use a salami slice and/or Venetian blind style, while message schemas will use a Russian Doll style.” See the Best Practices Web Page [BestPractices] for a discussion of these styles.

4Some architectures have a block, which is intermediate between an assembly and a component. The distinction among those three are not always clear. POSC will maintain the two concepts only.

2004-03-09 Guidelines for XML Schema Page 9 / 9