Guidelines for XML Schema1
Version 2003
John I. Bobbitt
POSC
License Agreement: © 2004, Petrotechnical Open Standards Consortium, Inc. All rights reserved. All access, receipt, and/or use of this document is subject to the POSC Product Licensing Agreement posted on the POSC Web site at http://www.posc.org/about/license.shtml.
Abstract: Guidelines and recommended practices are given for development of XML Schemas within the POSC framework.
Modified 2004-03-09. Modified our agreement to UK Guide to state that the POSC policy is that element names are UpperCamelCase, and attribute names are lowerCamelCase.
A developer of XML schemas is faced with many different style and implementation decisions in the development process. A set of guidelines will help the developer eliminate many of these choices, and lead to more consistent schemas. They will also lead to more consistent XML instance documents based on the schemas.
This project was initiated in 2001. At that time, there were a dozen or so similar documents by different groups. The goal of the POSC document was to find where consensus existed in many of the issues, and to document and follow these consensus best practices. The first version of this document discussed and codified some of these issues.
In general, usage of XML schemas was too new to have a consensus of best practices. Most of the other guidelines documents were struggling to understand the choices, and to develop their guidelines. Since then, there has been more understanding of the choices, and consensus in some of the areas.
Rather than repeat all of these items in 'yet another guidelines document,' we will adopt a particular document, and comment briefly on variations from that document. The adopted document is e-Government Schema Guidelines for XML, Version 3.0 dated December 2002, hereinafter referred to as the 'UK Guide.' It is published by the Office of the e-Envoy of the United Kingdom, and is available at http://www.e-envoy.gov.uk/Resources/Guidelines/fs/en.. (See [UKGuide]). It is recommended that the reader of this paper also have this document readily available, since many of the details of the guidelines are contained in the UK Guide.
The UK Guide has three sections for recommendations: Schema Guidelines, Schema Component Guidelines, and Metadata and Schemas. All three sections will be considered in Section 2, along with summaries and comments on variances from their recommendations. These variances will be minor, and, as a whole, the UK Guide is adopted as a guideline for POSC development2.
There are many issues that are not covered in the UK Guide. These fall into the realm of best practices, and are discussed in detail in several papers developed by POSC. These documents will also be part of the Guidelines. Section 3 will cover these best practice documents.
Three chapters will be covered in summary fashion. The Guidelines are summarized in sections 2.1 and 2.2 below, and comprise (in general) three bulletted items. The first will be a quote from the UK Guide and the second bullet will be a statement of the POSC compliance with that item. The third (and occasionally additional) bullet will be a comment on the degree of POSC endorsement.
Primary Schema Language:
The primary schema language SHALL be XML Schema
Endorsed by POSC
Comment: The alternative would be DTD's. POSC will no longer develop in DTD's, but will keep the DTD's developed for earlier uses.
Schema Complexity:
Develop schemas using the simplest schema methods available.
Endorsed by POSC
Model Data, Not Forms
Endorsed by POSC
Forms generally reflect a process or an event. The application schema may collect together modules to model these processes, but the modules themselves should reflect the data.
Use of Namespace and Qualifiers
If your schema document has a target namespace, any default namespace for the document MUST be the same as the target namespace.
Endorsed by POSC.
The document refers to the XML schema.
Namespace and versions
A namespace URI MUST not contain version information
Endorsed, with modification, by POSC
The POSC interpretation is that the namespace may contain version information, but it must not be used to determine the version.
Versioning strategies are still unresolved in the XML community. There is no best practice agreement of where this information should be, or what it means. However, there does seem to be agreement that the namespace should not be used to determine the version. Various ways to indicate a schema version are 1) include the built-in version in the schema element, 2) define a standard header as a schema comment, 3) add, in a standard way, a schemaVersion element to the root element of a message, 4) incorporate into the target namespace, 5) include in the pathname, where the schema is stored. This guideline prohibits the use of method 4. The guideline, as developed in the UKGuide, may or may not prohibit 5, but POSC allows this method of organizing files.
Use of elementFormDefault and attributeFormDefault
elementFormDefault MUST be set to qualified and attributeFormDefault SHOULD be set to unqualified.
Endorsed by POSC.
Messages and Schema
A message schema SHOULD describe a single type of XML message.
Endorsed by POSC
A message schema as used by the UK Guide is equivalent to the term, application schema, of POSC (See [ProfilesAppSchemas] document). It is a collection of modules and additional schema, brought together, for the purposes of a single data exchange need.
Data Type vs. Element Declarations
In many cases, there is a choice of defining a re-usable component as either a data type or as an element. A component MUST be defined as a data type if either: (1) it is to be used with different element names in different contexts; or (2) it is expected that further data types will be derived from it. A component MUST be defined as an element if (1) there is no intention to derive new components from it; and (2) the element is to be used with its name unchanged
Endorsed, with modification, by POSC
Any global element defined by POSC will refer to a defined datatype (or to a W3C datatype). Note that the UK Guide is referring to global elements, and not locally defined elements with anonymous datatypes.
A further modification is with some elements, such as Name and Comment, that are used many times, but need not be global elements.
Global Definitions:
Schema documents SHOULD only make available globally those component definitions that are either (1) re-used within the schema; (2) to be made made available for re-use in other schemas; or (3) are intended to be used as the document element of instance documents3.
Endorsed, with modification, by POSC
POSC also allows abstract, global elements to serve as placeholders for extensions. POSC also encourages the definition of a global element where a common term can be used, which will allow this element to have the same POSC namespace as its subelements.
Common Definitions and Namespaces:
An architectural schema that contains a collection of schema components (elements and/or datatypes) that will be reused locally within a number of schemas, SHOULD be defined without a target namespace.
Endorsed by POSC.
The UK Guide also anticipates a global catalog of elements that are specific to UK government use. They require that these be in a single namespace.
These apply to components and assemblies in the POSC architecture.
Elements vs. Attributes:
Elements SHOULD be the main holders of information. Attributes SHOULD be used to hold ancillary metadata which provide more information about the element. Attribute SHOULD NOT be used to qualify other attributes.
Endorsed by POSC
Indicating Value Sets
Value sets SHOULD be indicated by an attribute on the element whose text content holds the value belonging to the value set.
NOT endorsed by POSC
The POSC term for a Value Set is an Enumerated List, or Reference Data. POSC has identified ways to handle such data, depending on the usage case for the enumerated list or reference data. The best practices will be given in an accompanying paper, [ReferenceData].
Representing alternative conditions:
Alternative conditions SHOULD be represented using attribute values rather than by presence or absence of an element.
Endorsed, with modification, by POSC.
Alternative conditions may also be represented using an element. The preferred datatype for such a representation is the boolean type.
Commenting Schemas:
In documenting a W3C XML Schema, the documentation element MUST be used rather than XML comments.
Endorsed, with modification, by POSC
When documentation is intended to document the meanings of elements, choices, etc, the documentation element MUST be used. When documenting comments for internal schema understanding, the XML comments MAY be used.
Use of Schema Reuse Features
Use of xsd:redefine SHOULD be avoided.
Endorsed, with modification, by POSC.
POSC recommends that xsd:redefine never be used.
Naming Conventions
The names of complex data types SHOULD end with the text string 'Structure'. The name of simple data types SHOULD end with the text string 'Type'. Because of this, avoid these endings for element names
Modified by POSC.
POSC presently has many complex data types that end with the string 'Type'. These will not be altered. Future developments will alter use the string Structure. POSC presently ends enumerated lists with the string Enum. This will be continued.
Naming Conventions
Abbreviations SHOULD NOT be used in elements and attributes.
Endorsed by POSC
POSC allows abbreviations in certain circumstances. Such abbreviations MUST be justifiable, and SHOULD be noted in and XML comment in the schema file.
Examples of abbreviations are ID, uom, POSC, UK. Examples that should NOT be used are Intl, Num.
Naming Conventions
All names SHOULD use upper camel case. That is, names start with an initial capital, then each new word within the name starts with an initial capital. Where an all uppercase abbreviation (such as UK) or a digit is incorporated into a name, the following word should start with a lower case letter.
Modified by POSC in two respects.
POSC will follow the abbreviation with an upper case letter. POSC guidelines are that attributes should be lower camel case.
Example difference. UK Guide would have 'UKpostalCode', while POSC would have 'UKPostalCode.' As an attribute, it would be 'ukPostalCode'.
Naming Conventions
Enumerated values SHOULD use lower case throughout. Where the value is a proper name or an abbreviation or acronym that normally is used with different capitalization, the usual capitalization should be used.
Endorsed, with modification, by POSC
POSC guidelines are that you SHOULD use lower case throughout, with a comparable exception for abbreviations and proper names.
Use of Government Data Standards Catalogue
Not a part of the POSC guidelines.
Use of Schema Inheritance
If an existing definition does not meet your exact requirements, you MAY use the XML Schema inheritance mechanism to define a new data type based largely on an existing one.
Endorsed by POSC.
If the schema developer is using a POSC module or table, he should develop extensions in his own namespace. See [ModulesPolicies].
Data Content of Elements
Optional elements which are designed to have content SHOULD NOT be allowed to occur empty. The schema SHOULD ensure that they are either absent or populated. Mandatory elements which are designed to have content SHOULD not be allowed to occur empty. The schema SHOULD ensure that they are populated.
Endorsed by POSC
Local vs. Global Attribute Definitions
In general, attributes SHOULD be given a local scope by defining them within the context of their owning element.
Endorsed by POSC.
Use of Mixed Content Model for Data
In a data-centric document, the mixed content model (where an element contains both other elements and character data), SHOULD be avoided
Endorsed by POSC
This section deals with defining meta data for the data elements for inclusion in a registry, and for indicating versions of the schema being used. None of the recommendations in this section are part of the POSC guidance document.
Following are additional guidelines developed by POSC
Naming Conventions:
Abstract elements are allowed, and MUST be global. All abstract element names MUST have an initial character of underscore (“_”).
An abstract element is generally intended to be a placeholder for an extension. The underscore highlights its abstract nature.
Naming Conventions:
Abstract types are allowed, and MUST begin with an initial “Abstract”.
An abstract type is generally intended to be a placholder for a user defined type. The “Abstract” highlights its abstract nature.
'Final' facet, 'block' facet
The use of the 'final' and 'block' facets, which control extensions and restrictions, SHOULD be avoided.
The final and block facets are facets that allow the developer of the schema to restrict and/or prohibit restrictions and extensions. Use of these with some definitions may imply to profilers that restrictions and extensions are allowed with definitions that do not contain these facets. It is preferred that documentation clearly state which schema definitions are extendable.
This policy is subject to change. The use of 'final' and 'block' are not prohibited at this point.
Many elements of the POSC XML schema architecture are explained in additional papers. These will be referenced later in this section.
The main part of the POSC architecture is the module. A module is a business object, encoded in XML and characterized by its XML Schema. Examples of modules would be a Well, a Business Associate, and a Directional Survey. A module is characterized by having a natural identifier. Its instantiation would be a self-contained set of information.
Comparable to, but slightly different from a module, is the table. A business object has properties. A table is a complex property of an object that is usually displayed as a table. Examples of tables would be daily production of a well, chemical usage for a month, and directional survey data. A table has no natural identifier. It may be instantiated as a self-contained set of information, but only makes sense in its context as a property of some business object. A table generally only has use in a single schema.
Another type of complex data is an assembly. An assembly is a set of information that is tightly coupled, but is usually not displayed as a table. However an assembly does constitute information that can be used in many contexts. An example of an assembly would be a survey location, which consists of (for example) a datum, a latitude, and a longitude. Other examples are a location in the US land survey system and a postal address. An assembly obtains its full semantic meaning only within the context of its use. For example, a survey location can be used as a well surface location, a wellbore bottomhole location, the location of a tower, or the location of a ship antenna.
A less complex data type than an assembly would be a component4. A component is a single element, or a very tight group of elements that generally represent a single value. Examples of components would be a measure type (value and unit of measure), complex number (real and imaginary parts), and an azimuth (value, unit of measure, and reference north direction). Components are used to build assemblies, tables, and modules.
It is not always clear where the lines are drawn between these various types. It is generally not important. However, it does give a reasonable taxonomy for discussion.
Namespaces for architectural types:
Components and Assemblies MUST be developed WITH NO target namespace (the chameleon model). Modules MUST be developed WITH a target namespace. A table may be developed with a target namespace, depending on its usage.
Components and assemblies are re-usable parts that are not complete until they are incorporated into a module or table. The usage of a table will determine whether it should have a separate namespace or not.
See the Introduction to Modules paper [IntroModule] for more examples of the modules.
For an example of building a module, see the paper, Build a Module [BuildModule].
Modules may be imported into other modules. See Importing Modules within your Module [ImportModule] for guidelines and best practices.
For a description of modules, profiles, and application schemas, see the paper, Modules, Profiles, and Application Schema [ProfilesAppSchema].
Outside groups may use POSC modules, tables, assemblies, and components. See the paper on Policies on Modules [ModulePolicies] for the requirements for importing and/or including schemas. POSC will also follow the rules for importing modules from other groups.
POSC Usage of Modules
POSC MUST follow the practices outlined in Policies on Modules [ModulePolicies] when using modules from other groups. Other groups SHOULD follow the policies when importing modules from the POSC namespace.
Note that modules are imported into other namespaces, and must retain the namespace of the import.
Modification of Modules
Modules SHALL NOT be changed other than subsetting and extensions when imported. Imorted modules MAY be profiled to include subsetting and extensions where permitted.
Modules are considered to be final representation of business objects that are re-usable. Changes to modules which would affect the output XML would destroy interoperability.
If changes to modules are desirable, the owner of the module should be contacted, and the changes should be made by the owner.
POSC Usage of Components and Assemblies
POSC MAY use components and assemblies from other groups. If used, POSC MUST document the source of the component or assembly. Other groups SHOULD follow this practice.
Note that the components and assemblies are used by including the schema, usually by cut-and-paste, into the POSC namespace. Other groups are encouraged to also do this.
Modification of Components and Assemblies
Components and assemblies incorporated into POSC modules SHALL be modified to conform to the POSC guidelines. The structure of the original SHALL be retained to the greatest extent possible.
Internal changes to names of types are irrelvant to the final XML output. The only questionable practice is that the case of elements and attributes may be changed. At the component and assembly level, it is considered preferable to allow such changes.
POSC, in conjunction with other groups, developed a best practice for handling units of measure. The recommendations for this are given in the Units of Measure Recommendation paper, [UOMRecs].
POSC has developed a process for enumerated data. The types of enumerated data, and how they should be handled, are discussed in the paper, Reference Data and Enumerated Lists Implemented in XML, [ReferenceData].
Data models have relationships between entities, which are emulated by relationships between XML modules. POSC has developed a paper on the three ways to handle relationships, with guidelines on handling the relationships. See Relationships in XML, [Relationships]
Dictionaries and registries are useful for capturing information about commonly referenced objects (such as coordinate reference systems. POSC has developed a paper, Examples of XML Dictionary Usage, [Dictionaries] along with some examples of dictionary usage in XML.
[ANSIX12] X12 Reference Model for XML Design, 2002-10, produced by the ANSI X12 committee, obtainable at http://www.x12.org/x12org/.
[BestPractices] Best Practices Homepage, developed and maintained by XML-dev and Mitre, obtainable at http://www.xfront.com/BestPracticesHomepage.html.
[ComProServ] PIDX XML Standards Master, Version 1.0, RP 3901, produced by PIDX, obtainable at http://committees.api.org/business/pidx/standards.htm.
[EBCCNAM] ebXML RT - Naming Convention for Core Component, 2001-05-10, produced by the ebXML group, obtainable at http://www.ebxml.org/specs/index.htm#technical_reports.
[ebTechArch] ebXML Technical Architecture Specification V1.0.4, 2001-02-16, produced by the ebXML group, obtainable at http://www.ebxml.org/specs/ebTA.pdf.
[FedDevGuide] Draft Federal XML Developer's Guide, 2002-04 (work in progress), produced by the Federal CIO Council, obtainable at http://xml.gov/documents/in_progress/developersguide.pdf.
[FedTagStds] Federal Tag Standards for Extensible Markup Language, 2001-06, produced by LMI, not obtainable from the internet.
[HKGuide] XML Schema Design and Management Guide, (4 parts), Draft versions dated in summer, 2003. Produced by Hong Kong Information Services Technology Division. Available at http://www.itsd.gov.hk/itsd/english/infra/eif.htm.
[IETFKeywords] Key Words for Use in RFCs to Indicate Requirement Level, 1997-03, obtainable at http://www.ietf.org/rfc/rfc2119.txt.
[ISO8601] International Standard Date and Time, 2001-11-10, produced by ISO. A web page that explains the formats is http://www.cl.cam.ac.uk/~mgk25/iso-time.html.
[ISO11179] ISO 11179 Part 5 - Naming and Identification, 1995-12, produced by ISO, obtainable at http://fdr.faa.gov/iso/ISO11179page.htm. There is a later version, that is available from the ISO website,
[UKGuide] e-Government Schema Guidelines for XML, 2002-12, produced by United Kingdom e-Envoy, obtainable at http://www.e-envoy.gov.uk/Resources/Guidelines/fs/en.
[Unicode] Unicode Charts, available at http://www.unicode.org/charts/.
[W3CSchemaDatatypes] W3C Schema Datatypes, 2001-05-02, produced by W3C, obtainable at http://www.w3.org/TR/xmlschema-2.
[W3CNamespaces] Namespaces in XML, 1999-01-14, produced by W3C, obtainable at http://www.w3.org/TR/REC-xml-names/.
[W3CSchemaPrimer] W3C Schema Primer, 2001-05-02, produced by W3C, obtainable at http://www.w3.org/TR/xmlschema-0.
[W3CSchemaStructures] W3C Schema Structures, 2001-05-02, produced by W3C, obtainable at http://www.w3.org/TR/xmlschema-1.
[Xlink] W3C XLink Specification, 2001-06, produced by W3C, obtainable at http://www.w3.org/TR/xlink/.
[Xpath] W3C XPath Specification, 1999, produced by W3C, obtainable at http://www.w3.org/TR/xpath/.
[XSL] W3C XSL and XSLT Specifications, produced by W3C, obtainable at http://www.w3.org/Style/XSL/.
POSC references are available in the following formats:
[html] html format readable by browsers
[doc] MS Word 97/2000/XP
[sxw] OpenOffice writer, v1.0
[IntroModule] Introduction to Modules, Copyright 2002-2003. Available in [html], [doc], [sxw].
[BuildModule] Build a Module - a tutorial. Copyright 2003. Available in [html], [doc], [sxw].
[ImportModule] Importing Modules within your Modules. Copyright 2003. Available in [html], [doc], [sxw].
[Guidelines] Guidelines for XML Schemas, Version 2003. Copyright 2003. Available in [html], [doc], [sxw].
[ModulePolicies] Policies on Modules. Copyright 2002-2003. Available in [html], [doc], [sxw].
[ProfilesAppSchema] Modules, Profiles, and Application Schemas. Copyright 2002-2003. Available in [html], [doc], [sxw].
[XMLTables] XML Tables. Copyright 2003. Available in [html], [doc], [sxw].
[ReferenceData] Reference Data and Enumerated Lists Implemented in XML. Copyright 2002-2003. Available in [html], [doc], [sxw].
[Dictionaries] Examples of XML Dictionary Usage. Copyright 2003. Available in [html], [doc], [sxw]. Accompanied by sample code.
[Relationships] Relationships in XML. Copyright 2003. Available in [html], [doc], [sxw].
[UOMRecs] Unit of Measure Recommendations. Copyright 2002-2003. Available in [html], [doc], [sxw].
1 © 2004, Petrotechnical Open Standards Consortium, Inc. All rights reserved. All access, receipt, and/or use of this document is subject to the POSC Product Licensing Agreement posted on the POSC Web site at http://www.posc.org/about/license.shtml.
2The first two sections will be adopted, with modifications as noted in section 2. The thrid section of the UK Guide will not be adopted, since this section is more relevant to the particular practices of the group.
3UK Guide comment: “In general, this means that architectural schemas will use a salami slice and/or Venetian blind style, while message schemas will use a Russian Doll style.” See the Best Practices Web Page [BestPractices] for a discussion of these styles.
4Some architectures have a block, which is intermediate between an assembly and a component. The distinction among those three are not always clear. POSC will maintain the two concepts only.
2004-03-09 Guidelines for XML Schema Page