[relaxng-user] "Converting DTDs (and DTD developers) to RELAX NG schemas" paper

James Clark jjc at jclark.com
Sat Nov 22 08:46:13 ICT 2003


On Fri, 2003-11-21 at 22:44, DuCharme, Bob (LNG-CHO) wrote:

> I've written a paper for XML 2003 on a technique for dual use of DTDs and
> RNG schemas suggested by James on the nxml list
> (http://groups.yahoo.com/group/emacs-nxml-mode/message/129). I've posted the
> paper at http://www.snee.com/bob/temp/06-04-05.html, and would love to hear
> suggestions from anyone who has a chance to read it in the next few days. 

A few comments:

1. You suggest doing RNG validation in addition to DTD validation. So
long as you do RNG DTD-compatibility checking, RNG validation alone is
enough (although I can well imaging non-technical reasons for doing DTD
validation).

2. With DTD validation, you can only check that the document is valid
with respect to the DTD that the document specifies.  In many cases,
even in a DTD-based environment, there is a requirement that the
document use a specific DTD, but DTD validation provides no way to
enforce this.  RNG validation helps with this too.

3. Apropos using these techniques with W3C XML Schema, if you use Trang
to convert from DTD to XSD, then your parameter entities will be
preserved, at least to some degree.

4. You say:

> To do this, however, the conversion utility parsing the DTD must know
> whether each parameter entity declares a piece of a content model or
> an attribute list, but to a DTD parser they're all just strings.

Although in an a DTD to RNG conversion Trang converts parameter entities
only to definitions, it can't do this unless it can figure out what kind
of thing the parameter entity is representing.  This is what makes
preserving parameter entities so hard: to a DTD parser they are all
indeed just strings, but you need to know more than this to do any kind
of conversion, whether to RNG or XSD. (What Trang does approximately is
to expand the parameter entities, keeping track of which parameter
entities each character in the result comes from, then parse, then
unexpand those parameter entities that correspond consistently to some
syntactic unit.)

James



More information about the relaxng-user mailing list