[relaxng-user] preferred implementation for error reporting

Daniel Veillard veillard at redhat.com
Fri Mar 11 08:22:07 ICT 2005

On Fri, Mar 11, 2005 at 12:18:13PM +0000, Sebastian Rahtz wrote:
> I've found that MSV gives the most verbose reports when I am stuck. I 
> also find it helpful
> to run trang and convert to XSD, and see what that says. For the reasons 
> that Daniel
> gives, I find xmllint the least helpful for complex content  model 
> debugging, wonderful
> though it is in most ways.

  I can try to explain a bit why it is so. basically to take the example
I provided, libxml2 will first start to validate foo on foo, this will
fail due to the missing attribute, xmllint will stack that error and 
rollback on the second branch of the choice. It can't raise the error
immediately because well the second choice may work, there isn't in RNG
like in XSD a deterministic clause which tells the validator that once foo
has been matched on foo no other validation branch may match it. Then the
second choice is attempted, fails and that adds to the stack of errors.
It is only when you pop up enough to get outside of a non-deterministic
content model that the error can be raised with certainties.

  On the other hand jing derivation based algorithm will keep a tree
of what ought to validate the current content as it progresses, so when it
get stuck it has a complete view of the expression which failed to match
and at least theorically can provide far better error reporting.

  Now when you convert to XSD, you are constraining the validator to a
deterministic content model, and when it is possible it means the validator
as I explained has a far easier job to report the errors because it can
do it immediately and in context. You are trading better errors reporting
with risk of failure to convert to a deterministic content model.
  Last point about libxml2, it actually tries to convert as much as possible
RNG to deterministic regexp content model validation (for space and efficiency)
so when faced with simple RNG content you may actually get XSD like
error reporting from libxml2 RNG validation.

  For those contemplating how to torture RNG validator error reporting

"From James Clark  2003-03-25 23:48 -------

Here's another example of not so good error recovery/reporting.  The
following is  a schema that matches a document if and only if the
document contains at least one element whose content is "atribute". 
If I validate tutorial-bad.xml with this, I get over 8000 lines of
error messages."

  The flexibility of Relax-NG is such that if you have 8000 element in the
validated file, they actually all are in error, and it suffice that one be
right to get validation. It is hard for a state based system to extract 
what James sumarized from the schemas. An ideal error reporting system 
would print:

   Validation failure: no node in the document has content "atribute"

But getting there sounds really an AI project xmllint will just report
a content model failure, and I assume jing raises the problem for one specific
node. See the URL if you want to get the RNG source.

  My conclusion is that the main drawback of RNG extreme versatility
  and flexibility is how hard it is to extract humanly understandable
  error messages when validation errors occurs.

The good point is that with RNG if you get an error from a validator it 
usually means the instance is in error, not that the validator is in error :-)


Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard at redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

More information about the relaxng-user mailing list