[relaxng-user] Latest proposal for smart regexes in RELAX NG
jcowan at reutershealth.com
jcowan at reutershealth.com
Wed Apr 28 17:11:28 ICT 2004
The idea of this version, which isn't too different from the versions
I've put together before, is that it embeds cleanly into RNG, using a
separate namespace so that everything is a proper RNG foreign element.
Specifically, any one element that matches the "regex" rule below may
appear as the child of an RNG "data" element. In that way, non-regex
processors just skip it properly. Here's the schema:
namespace rx = "urn:x-rng:rx" # to be changed
# Basic components: width-one and width-zero objects:
one = element rx:one{string {length = 1}
boundary = element rx:boundary {
attribute type {"bos" | "eos" | "bol" | "eol" | "bow" | "eow"}
# beginning/end of string, line, word
}
# Iterators and operators
zeroOrMore = element rx:zeroOrMore {regex+}
oneOrMore = element rx:oneOrMore {regex+}
optional = element rx:optional {regex+}
choice = element rx:choice {regex+}
group = element rx:group {regex+}
# Conveniences
\string = element rx:string {string} # must match this string
charset = element rx:charset {string} # matches one char, any of these
class = element rx:class { # matches one char of a named class
attribute name {xsd:NCName}
}
word = element rx:word {regex+}
# Character set operations
complement = element rx:complement {cset+} # complement of union
difference = element rx:difference {cset+}
intersect = element rx:intersect {cset+}
union = element rx:union {cset+}
range = element rx:range {string} # from r[0] to r[1], r[2] to r[3], etc.
# Reference to defined regex
# Regexes are defined using define/data/rx:*
ref = element rx:ref {
attribute name {xsd:NCName}
}
# Escape hatch
pattern = element rx:pattern {string} # Posix regex string
# Content models
regex = boundary | zeroOrMore | oneOrMore | optional | choice | group |
\string | word | cset | ref | pattern
cset = one | charset | class | complement | difference |
union | intersection | range
--
Winter: MIT, John Cowan
Keio, INRIA, jcowan at reutershealth.com
Issue lots of Drafts. http://www.ccil.org/~cowan
So much more to understand! http://www.reutershealth.com
Might simplicity return? (A "tanka", or extended haiku)
More information about the relaxng-user
mailing list