[relaxng-user] preferred implementation for error reporting
veillard at redhat.com
Fri Mar 11 08:22:07 ICT 2005
On Fri, Mar 11, 2005 at 12:18:13PM +0000, Sebastian Rahtz wrote:
> I've found that MSV gives the most verbose reports when I am stuck. I
> also find it helpful
> to run trang and convert to XSD, and see what that says. For the reasons
> that Daniel
> gives, I find xmllint the least helpful for complex content model
> debugging, wonderful
> though it is in most ways.
I can try to explain a bit why it is so. basically to take the example
I provided, libxml2 will first start to validate foo on foo, this will
fail due to the missing attribute, xmllint will stack that error and
rollback on the second branch of the choice. It can't raise the error
immediately because well the second choice may work, there isn't in RNG
like in XSD a deterministic clause which tells the validator that once foo
has been matched on foo no other validation branch may match it. Then the
second choice is attempted, fails and that adds to the stack of errors.
It is only when you pop up enough to get outside of a non-deterministic
content model that the error can be raised with certainties.
On the other hand jing derivation based algorithm will keep a tree
of what ought to validate the current content as it progresses, so when it
get stuck it has a complete view of the expression which failed to match
and at least theorically can provide far better error reporting.
Now when you convert to XSD, you are constraining the validator to a
deterministic content model, and when it is possible it means the validator
as I explained has a far easier job to report the errors because it can
do it immediately and in context. You are trading better errors reporting
with risk of failure to convert to a deterministic content model.
Last point about libxml2, it actually tries to convert as much as possible
RNG to deterministic regexp content model validation (for space and efficiency)
so when faced with simple RNG content you may actually get XSD like
error reporting from libxml2 RNG validation.
For those contemplating how to torture RNG validator error reporting
"From James Clark 2003-03-25 23:48 -------
Here's another example of not so good error recovery/reporting. The
following is a schema that matches a document if and only if the
document contains at least one element whose content is "atribute".
If I validate tutorial-bad.xml with this, I get over 8000 lines of
The flexibility of Relax-NG is such that if you have 8000 element in the
validated file, they actually all are in error, and it suffice that one be
right to get validation. It is hard for a state based system to extract
what James sumarized from the schemas. An ideal error reporting system
Validation failure: no node in the document has content "atribute"
But getting there sounds really an AI project xmllint will just report
a content model failure, and I assume jing raises the problem for one specific
node. See the URL if you want to get the RNG source.
My conclusion is that the main drawback of RNG extreme versatility
and flexibility is how hard it is to extract humanly understandable
error messages when validation errors occurs.
The good point is that with RNG if you get an error from a validator it
usually means the instance is in error, not that the validator is in error :-)
Daniel Veillard | Red Hat Desktop team http://redhat.com/
veillard at redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
More information about the relaxng-user