[relaxng-user] RNC and repeatedPrimary

Bob Foster bob at objfac.com
Sat May 8 13:30:35 ICT 2004

Jeff Rafter wrote:
> With that said, I can't figure out a way to make a single pass through an
> RNC grammar without either (a) buffering an unknown number of tokens in
> memory, or (b) buffering the resulting grammar. It is likely academic
> though-- multiple passes are probably fine, as is storing the result
> document/grammar in memory. Luckily RNC grammars are compact.
> Of course, the folks on this list know far more than I do about parsing-- so
> if there is a way to do it without (a) or (b) then I would love to hear it
> (links to various approaches would be very welcome).

Well, you're going to have to store something, because you don't know 
what to generate for a pattern until you've seen the entire pattern. 
(Unlike a define or a div, where you know everything you need to know 
after parsing the first two tokens.) But you should be able to discard 
whatever you store for a pattern after you've generated events for it.

(I'm assuming your translator generates SAX events, or something like them.)

The classical solution for expressions like pattern is to produce a 
parse tree, where the nodes are in the order needed for generation. 
Thus, an expression like:

(a | b)*

would produce a tree like:

       /  \
     id    id
      |     |
     "a"   "b"

from which you can generate

     <ref name="a"/>
     <ref name="b"/>

during a simple pre-order walk.

Producing a parse tree is very easy in a top-down parser if every 
matched rule returns a node.

As to references, I don't know exactly where to start, except of course, 
the dragon book. Most people don't write parsers these days, but 
generate them from annotated grammars using a "compiler-compiler" like 
JavaCC or ANTLR. (I'm pretty sure YACC would have problems with the RNC 
grammar.) Google will find all of these for you.

Bob Foster

More information about the relaxng-user mailing list