Comments and Questions on XML Exchange Format

Comments and questions (and answers) are being kept in this area to keep from cluttering up the main document. It is intended that this be both an FAQ and a place to seek input from others. Please look at and respond to the questions that are posed.

Contents:

  Should we rely on case sensitivity of tags

Comment: Using uppercase for entities and lowercase for attributes places too much emphasis on case. I would prefer mixed case.

Answer: The section Entity and Attribute Naming discusses the problem. I will expand a little on it.

Epicentre uses the same name for an entity and for an attribute which is a reference to that entity. For example, there is an entity, ellipsoid. The entity, geodetic_datum, has a reference to that entity. The attribute name is also ellipsoid.

While this is permitted in databases, it causes problems in XML. That is because tags are global. I cannot define a tag, ellipsoid, for use as an entity, and later define it for use as an attribute. Thus, I have to find some way to distinguish between the entity, ellipsoid, and the attribute, ellipsoid.

I discuss four ways of doing so in the above mentioned text. For the reasons I described, I chose the option 3A, use upper case for entities and lower case for attributes. The two tags become ELLIPSOID and ellipsoid:

  <!ELEMENT ELLIPSOID (...list of its attribute tags...)>
  <!ELEMENT ellipsoid (instance | instance-ref)>

Question: Is this the right choice? Who would prefer one of the other options?

 Why put in the data types?

Comment: It seems a bit much to put the string tag in. Why not just use #PCDATA? Why not use XML Schema to control the data types?

Answer: XML Schema has not been implemented widely enough to count on tools to be available. It is safer to go with a DTD, and later update it when XML Schema becomes more widely implemented. I have set things up to do so by having a single key to identify each data type.

Answer: The other question is should we put a key for each data type. In particular, there are six datatypes that could be replaced with a simple #PCDATA: integer, real, string, binary, boolean, and logical. This would remove a layer for 99% of the data, and only the more complicated data types (generally timestamp, quantity, and location in Epicentre) would need a tag with a particular structure.

There are three reasons. The first is explained in the first answer - that I think it will be easier to move to XML Schema if we build the base structure on data types.

A second reason is to more closely match the STEP 28 process. As I mentioned, there is a standard way of going from an Express file to an XML file. I have given an example of one such instance in Appendix A. As you can see, it give even more layers. In particular, it incorporates the ndt as a layer between the attribute tag and the data type tag. I did not want to get too far away from the STEP 28 process, so I kept the data type tag as part of the model.

The third reason is for modularity. By having the base definitions on the data types, it is possible to develop software modules to handle them. For example, you could build java classes to handle each data type. You could build XSL templates for each data type. These could be reused in all mappings from data models that use the same data type mappings.

Another type of modularity is for the DTD's. I have four DTD's to handle the total DTD. The lowest level - the leaf tags - are the data types. This DTD is basically a cut and paste from the document. The highest level is the root tag, which contains the PEF Objects (also defined in the document) and the tag: ExchangeSet:

  <ELEMENT ExchangeSet (ANY)>

It is intended that the children of the exchange set will be all of the entities of the model.

The other two DTD's (the second and third levels) are obtained from a data model. The second level will be the entities, whose children are the attributes. It is a simple script to start with a DDL or an Express file and create such a DTD.

The third level are the attribute tags. Again, this is generated quite simply from the data model, with a little hand editing to handle duplicate attributes. The children of an attribute are its data type(s).

If you remove the string, real, ..., tags you begin mixing the third and fourth layers. While this is not critical, it does make it harder to generate the DTD's, and a bit more difficult for applications to handle this situation.

Question: Is this approach better? Or would people prefer to drop the tags for those six data types?


Last Modified: November 14, 2000
© Copyright 2000 POSC. All rights reserved.