[relaxng-user] Latest proposal for smart regexes in RELAX NG
David Tolpin
dvd at davidashen.net
Fri May 7 09:17:22 ICT 2004
> Your example covered only a subset of the RFC822 address production.
By the way, my example covers the full syntax of addr-spec in RFC2822.
Jeffrey Friedl's has bugs in it. But only because his purpose was
manifestly to impress an innocent reader by presenting a specimen
of programming style for an obfuscated programming contest.
Let me quote my example again.
Using XML Schema Datatype Library:
start=element addr-spec {
xsd:token {
pattern=
"(\(([^\(\)\\]|\\.)*\) )?"
~ """([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+)*|"([^"\\]|\\.)*")"""
~ "@"
~ "([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+)*|\[([^\[\]\\]|\\.)*\])"
~ "( \(([^\(\)\\]|\\.)*\))?"
}
}
With a small extension (implemented in RNV using embedded Scheme interpreter):
datatypes dsl = "http://davidashen.net/relaxng/scheme-datatypes"
start=element addr-spec {
dsl:token {
s-pattern="""
comment = "\(([^\(\)\\]|\\.)*\)"
atom = "[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+"
atoms = atom "(\." atom ")*"
person = "\"([^\"\\]|\\.)*\""
location = "\[([^\[\]\\]|\\.)*\]"
local-part = "(" atom "|" person ")"
domain = "(" atoms "|" location ")"
start = "(" comment " )?" local-part "@" domain "( " comment ")?"
"""
}
}
The only difference is that the same string-ish regexp is broken
into parts.
Rewrite it in XML syntax. Let us see which syntax is more readable.
David
More information about the relaxng-user
mailing list